Re: Speedier count(*)

Поиск
Список
Период
Сортировка
От John A Meinel
Тема Re: Speedier count(*)
Дата
Msg-id 42FAA4E6.5070703@arbash-meinel.com
обсуждение исходный текст
Ответ на Speedier count(*)  (Dan Harris <fbsd@drivefaster.net>)
Список pgsql-performance
Dan Harris wrote:
> I have a web page for my customers that shows them count of records  and
> some min/max date ranges in each table of a database, as this is  how we
> bill them for service.  They can log in and check the counts  at any
> time.  I'd like for the counts to be as fresh as possible by  keeping
> this dynamic, but I will use a periodic 'snapshot'/cron job  if that is
> the only option to speed this up.   I have thought about  using the
> table statistics, but the estimate error is probably  unacceptable
> because of the billing purposes.
>
> For some reason, the SQL Server we migrated the app from can return
> count(*) in a split second on multi-million row tables, even though  it
> is a MUCH slower box hardware-wise, but it's now taking many  seconds to
> run. I have read in the archives the problems MVCC brings  into the
> count(*) dilemma forcing Pg to run a seq scan to get  counts.  Does
> SQLServer not use MVCC or have they found another  approach for arriving
> at this number?  Compounding all the min/max  and counts from other
> tables and all those queries take about a  minute to run. The tables
> will contain anywhere from 1 million to 40  million rows.

I believe SQL Server doesn't use MVCC in the same way. At the very
least, it stores some row information in the index, so it can get some
info from just an index, without having to go to the actual page (MVCC
requires a main page visit to determine visibility.)

Depending on how much it impacts performance, you can create an
INSERT/UPDATE trigger so that whenever a new entry is added, it
automatically updates a statistics table. It would be maintained as you
go, rather than periodically like a cron job.

I would go Cron if things can be slightly out of date (like 1 hour at
least), and you need updates & inserts to not be slowed down.
Otherwise I think the trigger is nicer, since it doesn't do redundant
work, and means everything stays up-to-date.


>
> Also, I am using "select ... group by ... order by .. limit 1" to get
> the min/max since I have already been bit by the issue of min() max()
> being slower.
>
>
> -Dan

John
=:->

Вложения

В списке pgsql-performance по дате отправления:

Предыдущее
От: Michael Fuhr
Дата:
Сообщение: Re: Speedier count(*)
Следующее
От: Tom Lane
Дата:
Сообщение: Re: partial index regarded more expensive