Re: more problems with count(*) on large table

Поиск
Список
Период
Сортировка
От Bill Moran
Тема Re: more problems with count(*) on large table
Дата
Msg-id 20071001133432.e6184b1e.wmoran@potentialtech.com
обсуждение исходный текст
Ответ на Re: more problems with count(*) on large table  (Mike Charnoky <noky@nextbus.com>)
Ответы Re: more problems with count(*) on large table  (Martijn van Oosterhout <kleptog@svana.org>)
Re: more problems with count(*) on large table  (Mike Charnoky <noky@nextbus.com>)
Список pgsql-general
In response to Mike Charnoky <noky@nextbus.com>:

> This is strange... count(*) operations over a period of one day's worth
> of data now take ~1-2 minutes to run or ~40 minutes.  It seems that the
> first time the data is queried it takes about 40 minutes.  If I try the
> query again, it finishes in 1-2 minutes!

This sounds like a caching issue.  My guess at what's happening is that
other operations are pushing this data out of the shared_buffers, so
when you run it, the system has to pull a bunch of tuples off the disk
to check them.  If you run it again immediately, the tuples are still in
memory, and it runs very fast.

If this is the case, you can speed up things by adding RAM/shared_buffers,
or by moving to faster disks.  The RAM solution is going to give you the
biggest performance improvement.

However, if there's enough other data on this system, you may have
difficulty getting enough RAM to mitigate the problem, in which case,
faster disks are going to be your best bet.

How much RAM do you have, and how much of it is allocated to shared_buffers?
What's your IO subsystem look like?

> Again, nothing else is happening on this db server except for a constant
> insertion into this table and a few others.  I have done "set statistics
> 100" for the evtime field in this table.
>
> Here is the output from EXPLAIN ANALYZE.  This is the same query run
> back to back, first time takes 42 minutes, second time takes less than 2
> minutes!
>
> mydb=# explain analyze select count(*) from prediction_accuracy where
> evtime between '2007-09-29' and '2007-09-30';
>
>   QUERY PLAN
>
>
---------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=3.02..3.03 rows=1 width=0) (actual
> time=2549854.351..2549854.352 rows=1 loops=1)
>    ->  Index Scan using pred_acc_evtime_index on prediction_accuracy
> (cost=0.00..3.02 rows=1 width=0) (actual time=97.676..2532824.892
> rows=11423786 loops=1)
>          Index Cond: ((evtime >= '2007-09-29 00:00:00-07'::timestamp
> with time zone) AND (evtime <= '2007-09-30 00:00:00-07'::timestamp with
> time zone))
>  Total runtime: 2549854.411 ms
> (4 rows)
>
> Time: 2549943.506 ms
> mydb=# explain analyze select count(*) from prediction_accuracy where
> evtime between '2007-09-29' and '2007-09-30';
>
>  QUERY PLAN
>
>
-------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=3.02..3.03 rows=1 width=0) (actual
> time=111200.943..111200.944 rows=1 loops=1)
>    ->  Index Scan using pred_acc_evtime_index on prediction_accuracy
> (cost=0.00..3.02 rows=1 width=0) (actual time=36.396..96347.483
> rows=11423786 loops=1)
>          Index Cond: ((evtime >= '2007-09-29 00:00:00-07'::timestamp
> with time zone) AND (evtime <= '2007-09-30 00:00:00-07'::timestamp with
> time zone))
>  Total runtime: 111201.000 ms
> (4 rows)
>
> Time: 111298.695 ms
>
>
> Mike
>
> Gregory Stark wrote:
> > "Mike Charnoky" <noky@nextbus.com> writes:
> >
> >> I altered the table in question, with "set statistics 100" on the
> >> timestamp column, then ran analyze.  This seemed to help somewhat.  Now,
> >> queries don't seem to hang, but it still takes a long time to do the count:
> >>  * "where evtime between '2007-09-26' and '2007-09-27'"
> >>    took 37 minutes to run (result was ~12 million)
> >>  * "where evtime between '2007-09-25' and '2007-09-26'"
> >>    took 40 minutes to run (result was ~14 million)
> >>
> >> Still stymied about the seemingly random performance, especially since I
> >> have seen this query execute in 2 minutes.
> >
> >
> > And the "explain analyze" for these?
> >
> > Are you still sure it's certain date ranges which are consistently problems
> > and others are consistently fast? Or could it be something unrelated.
> >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq


--
Bill Moran
http://www.potentialtech.com

В списке pgsql-general по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: more problems with count(*) on large table
Следующее
От: "paul rivers"
Дата:
Сообщение: Re: Partitioned table limitation