Обсуждение: select count(*) performance

Поиск
Список
Период
Сортировка

select count(*) performance

От
runic
Дата:
Hello Group,

I'm new in PostgreSQL Business, therefore please forgive me a "newbie"
Question. I have a table with ca. 1.250.000 Records. When I execute
a "select count (*) from table" (with pgAdmin III)  it takes about 40
secs.
I think that takes much to long. Can you please give me hints, where
I can search for Improvements?

TIA, Det


Re: select count(*) performance

От
"Merlin Moncure"
Дата:
On 8/8/07, runic <runic@gmx.de> wrote:
> Hello Group,
>
> I'm new in PostgreSQL Business, therefore please forgive me a "newbie"
> Question. I have a table with ca. 1.250.000 Records. When I execute
> a "select count (*) from table" (with pgAdmin III)  it takes about 40
> secs.
> I think that takes much to long. Can you please give me hints, where
> I can search for Improvements?

This is a FAQ.  This operation is optimized in some other database
engines but not in PostgreSQL due to way the locking engine works.
There are many workarounds, maybe the easiest is to get an approximate
count using
select reltuples from pg_class where relname = 'your_table' and relkind = 'r';

merlin

Re: select count(*) performance

От
Brian Hurt
Дата:
runic wrote:

>Hello Group,
>
>I'm new in PostgreSQL Business, therefore please forgive me a "newbie"
>Question. I have a table with ca. 1.250.000 Records. When I execute
>a "select count (*) from table" (with pgAdmin III)  it takes about 40
>secs.
>I think that takes much to long. Can you please give me hints, where
>I can search for Improvements?
>
>TIA, Det
>
>

1) VACUUM FULL the table, maybe the whole database.
2) Buy more/faster hard disks

The problem is that count(*) on a table has to scan the whole table, due
to the fact that Postgres uses MVCC for it's concurrency control.  This
is normally a huge win- but one of the few places where it's a loss is
doing count(*) over a whole table.  In this case, Postgres has no choice
but to inspect each and every row to see if it's live or not, and thus
has no choice but to read in the whole table.

If you've been doing a lot of inserts, updates, and/or deletes to the
table, and you either don't have autovacuum turned on or agressive
enough, the table can be littered with a bunch of dead rows that haven't
been deleted yet.  Postgres still has to read in those rows to make sure
they're dead, so it's easy for it to have to read many multiples of the
number of live rows in the table.  What vacuum does is it goes through
and deletes those dead rows.

If that isn't the problem, then it's just that you have to read the
whole table.  If the rows are large enough, and the disk subsystem is
slow enough, this can just take a while.  My advice in this case to buy
either more disks and/or faster disks, to speed up the reading of the table.

Brian


Re: select count(*) performance

От
"Kevin Grittner"
Дата:
>>> On Fri, Aug 10, 2007 at  8:08 AM, in message
<46BC6342.4010002@janestcapital.com>, Brian Hurt <bhurt@janestcapital.com>
wrote:
> runic wrote:
>
>>I have a table with ca. 1.250.000 Records. When I execute
>>a "select count (*) from table" (with pgAdmin III)  it takes about 40
>>secs.
>>I think that takes much to long. Can you please give me hints, where
>>I can search for Improvements?
>>
>>TIA, Det
>
> 1) VACUUM FULL the table, maybe the whole database.
> 2) Buy more/faster hard disks

Det,

Forty seconds is a long time for only 1.25 million rows.  I just ran a count
against a production database and it took 2.2 seconds to get a count from a
table with over 6.8 million rows.

In addtion to the advice given by Brian, I would recommend:

3)  Make sure you are using a recent version of PostgreSQL.  There have been
signiificant performance improvements lately.  If you're not on 8.2.4, I'd
recommend you convert while your problem table is that small.

4)  Make sure you read up on PostgreSQL configuration.  Like many products,
PostgreSQL has a default configuration which is designed to start on just
about anything, but which will not perform well without tuning.

5)  Consider whether you need an exact count.  I just selected the reltuples
value from pg_class for the table with the 6.8 million rows, and the value I
got was only off from the exact count by 0.0003%.  That's close enough for
many purposes, and the run time is negligible.

6)  If you're looking at adding hardware, RAM helps.  It can help a lot.

I'll finish by restating something Brian mentioned.  VACUUM. Use autovacuum.
You should also do scheduled VACUUM ANALYZE, under the database superuser
login, on a regular basis.  We do it nightly on most of our databases.
Without proper maintenance, dead space will accumulate and destroy your
performance.

Also, I don't generally recommend VACUUM FULL.  If a table needs agressive
maintenance, I recommend using CLUSTER, followed by an ANALYZE.  It does a
better job of cleaning things up, and is often much faster.

I hope this helps.

-Kevin



Re: select count(*) performance

От
valgog
Дата:
On Aug 11, 5:54 pm, Detlef Rudolph <ru...@gmx.de> wrote:
> Hello Group,
>
> I've tried the VACUUM ANALYSE, that doesn't help
> much, but VACUUM FULL improves Performance down
> from about 40 secs to 8. I think in future I would
> use the reltuples value from pg_class for the table.
>
> Thanks a lot for your answers and a good Sunday,
> Det

just do not forget, that reltuples is count and updated in pg_class
only during the vacuuming or analyzing of a table... so the value is
only an APPROXIMATE....

-- Valentine


Re: select count(*) performance

От
"Kevin Grittner"
Дата:
>>> valgog <valgog@gmail.com> 08/13/07 6:38 AM >>>
On Aug 11, 5:54 pm, Detlef Rudolph <ru...@gmx.de> wrote:
>
> I've tried the VACUUM ANALYSE, that doesn't help
> much, but VACUUM FULL improves Performance down
> from about 40 secs to 8.

Det,

I don't think anyone meant to suggest that VACUUM ANALYZE would improve the
count speed on a table which had become bloated, but its routine use would
PREVENT a table from becoming bloated.  Once bloat occurs, you need more
agressive maintenance, like VACUUM FULL or CLUSTER.

VACUUM FULL tends to cause index bloat, so you will probably see performance
issues in other queries at the moment.  You will probably need to REINDEX
the table or use CLUSTER to clean that up.

-Kevin




Re: select count(*) performance

От
Detlef Rudolph
Дата:
Hello Group,

I've tried the VACUUM ANALYSE, that doesn't help
much, but VACUUM FULL improves Performance down
from about 40 secs to 8. I think in future I would
use the reltuples value from pg_class for the table.

Thanks a lot for your answers and a good Sunday,
Det

Re: select count(*) performance

От
"Scott Marlowe"
Дата:
On 8/11/07, Detlef Rudolph <runic@gmx.de> wrote:
> Hello Group,
>
> I've tried the VACUUM ANALYSE, that doesn't help
> much, but VACUUM FULL improves Performance down
> from about 40 secs to 8.

If vacuum full fixes a performance problem, then you have a regular
vacuum problem of some sort.

Make sure regular vacuum runs often enough and make sure your fsm
settings are high enough to allow it to reclaim all deleted tuples
when it does run.