Обсуждение: select count(*) performance
Hello Group, I'm new in PostgreSQL Business, therefore please forgive me a "newbie" Question. I have a table with ca. 1.250.000 Records. When I execute a "select count (*) from table" (with pgAdmin III) it takes about 40 secs. I think that takes much to long. Can you please give me hints, where I can search for Improvements? TIA, Det
On 8/8/07, runic <runic@gmx.de> wrote: > Hello Group, > > I'm new in PostgreSQL Business, therefore please forgive me a "newbie" > Question. I have a table with ca. 1.250.000 Records. When I execute > a "select count (*) from table" (with pgAdmin III) it takes about 40 > secs. > I think that takes much to long. Can you please give me hints, where > I can search for Improvements? This is a FAQ. This operation is optimized in some other database engines but not in PostgreSQL due to way the locking engine works. There are many workarounds, maybe the easiest is to get an approximate count using select reltuples from pg_class where relname = 'your_table' and relkind = 'r'; merlin
runic wrote: >Hello Group, > >I'm new in PostgreSQL Business, therefore please forgive me a "newbie" >Question. I have a table with ca. 1.250.000 Records. When I execute >a "select count (*) from table" (with pgAdmin III) it takes about 40 >secs. >I think that takes much to long. Can you please give me hints, where >I can search for Improvements? > >TIA, Det > > 1) VACUUM FULL the table, maybe the whole database. 2) Buy more/faster hard disks The problem is that count(*) on a table has to scan the whole table, due to the fact that Postgres uses MVCC for it's concurrency control. This is normally a huge win- but one of the few places where it's a loss is doing count(*) over a whole table. In this case, Postgres has no choice but to inspect each and every row to see if it's live or not, and thus has no choice but to read in the whole table. If you've been doing a lot of inserts, updates, and/or deletes to the table, and you either don't have autovacuum turned on or agressive enough, the table can be littered with a bunch of dead rows that haven't been deleted yet. Postgres still has to read in those rows to make sure they're dead, so it's easy for it to have to read many multiples of the number of live rows in the table. What vacuum does is it goes through and deletes those dead rows. If that isn't the problem, then it's just that you have to read the whole table. If the rows are large enough, and the disk subsystem is slow enough, this can just take a while. My advice in this case to buy either more disks and/or faster disks, to speed up the reading of the table. Brian
>>> On Fri, Aug 10, 2007 at 8:08 AM, in message <46BC6342.4010002@janestcapital.com>, Brian Hurt <bhurt@janestcapital.com> wrote: > runic wrote: > >>I have a table with ca. 1.250.000 Records. When I execute >>a "select count (*) from table" (with pgAdmin III) it takes about 40 >>secs. >>I think that takes much to long. Can you please give me hints, where >>I can search for Improvements? >> >>TIA, Det > > 1) VACUUM FULL the table, maybe the whole database. > 2) Buy more/faster hard disks Det, Forty seconds is a long time for only 1.25 million rows. I just ran a count against a production database and it took 2.2 seconds to get a count from a table with over 6.8 million rows. In addtion to the advice given by Brian, I would recommend: 3) Make sure you are using a recent version of PostgreSQL. There have been signiificant performance improvements lately. If you're not on 8.2.4, I'd recommend you convert while your problem table is that small. 4) Make sure you read up on PostgreSQL configuration. Like many products, PostgreSQL has a default configuration which is designed to start on just about anything, but which will not perform well without tuning. 5) Consider whether you need an exact count. I just selected the reltuples value from pg_class for the table with the 6.8 million rows, and the value I got was only off from the exact count by 0.0003%. That's close enough for many purposes, and the run time is negligible. 6) If you're looking at adding hardware, RAM helps. It can help a lot. I'll finish by restating something Brian mentioned. VACUUM. Use autovacuum. You should also do scheduled VACUUM ANALYZE, under the database superuser login, on a regular basis. We do it nightly on most of our databases. Without proper maintenance, dead space will accumulate and destroy your performance. Also, I don't generally recommend VACUUM FULL. If a table needs agressive maintenance, I recommend using CLUSTER, followed by an ANALYZE. It does a better job of cleaning things up, and is often much faster. I hope this helps. -Kevin
On Aug 11, 5:54 pm, Detlef Rudolph <ru...@gmx.de> wrote: > Hello Group, > > I've tried the VACUUM ANALYSE, that doesn't help > much, but VACUUM FULL improves Performance down > from about 40 secs to 8. I think in future I would > use the reltuples value from pg_class for the table. > > Thanks a lot for your answers and a good Sunday, > Det just do not forget, that reltuples is count and updated in pg_class only during the vacuuming or analyzing of a table... so the value is only an APPROXIMATE.... -- Valentine
>>> valgog <valgog@gmail.com> 08/13/07 6:38 AM >>> On Aug 11, 5:54 pm, Detlef Rudolph <ru...@gmx.de> wrote: > > I've tried the VACUUM ANALYSE, that doesn't help > much, but VACUUM FULL improves Performance down > from about 40 secs to 8. Det, I don't think anyone meant to suggest that VACUUM ANALYZE would improve the count speed on a table which had become bloated, but its routine use would PREVENT a table from becoming bloated. Once bloat occurs, you need more agressive maintenance, like VACUUM FULL or CLUSTER. VACUUM FULL tends to cause index bloat, so you will probably see performance issues in other queries at the moment. You will probably need to REINDEX the table or use CLUSTER to clean that up. -Kevin
Hello Group, I've tried the VACUUM ANALYSE, that doesn't help much, but VACUUM FULL improves Performance down from about 40 secs to 8. I think in future I would use the reltuples value from pg_class for the table. Thanks a lot for your answers and a good Sunday, Det
On 8/11/07, Detlef Rudolph <runic@gmx.de> wrote: > Hello Group, > > I've tried the VACUUM ANALYSE, that doesn't help > much, but VACUUM FULL improves Performance down > from about 40 secs to 8. If vacuum full fixes a performance problem, then you have a regular vacuum problem of some sort. Make sure regular vacuum runs often enough and make sure your fsm settings are high enough to allow it to reclaim all deleted tuples when it does run.