Re: count(*) and bad design was: Experiences with extensibility

Поиск
Список
Период
Сортировка
От Ivan Sergio Borgonovo
Тема Re: count(*) and bad design was: Experiences with extensibility
Дата
Msg-id 20080109200105.20114cec@webthatworks.it
обсуждение исходный текст
Ответ на Re: count(*) and bad design was: Experiences with extensibility  ("Scott Marlowe" <scott.marlowe@gmail.com>)
Ответы Re: count(*) and bad design was: Experiences with extensibility  ("Joshua D. Drake" <jd@commandprompt.com>)
Список pgsql-general
On Wed, 9 Jan 2008 10:30:45 -0600
"Scott Marlowe" <scott.marlowe@gmail.com> wrote:

> Now, everything's a tradeoff.  If PostgreSQL had visibility
> information in the indexes, it would have to lock both the table and
> index for every write, thus slowing down all the other queries that
> are trying to access the table.  It would be a tradeoff that
> sacrificed write speed for read speed. In a db that was used mostly
> for writing, it would likely be a fair trade.  In a db that did a
> lot of writing, it might slow the whole thing to a crawl.

OK... we are getting near to the point. I understand the trade-off
problem in storing into indexes id the row is still there.
Is there a way to get the count of the rows that *may be* there, I
mean the one that satisfy the where clauses no matter if in the
meanwhile they were deleted or not and reindex on a regular basis.
That would be simple and I think fast and an acceptable solution for
most CMS.

> Now, if there's a where clause that's selective enough, then a
> count(*) query may use the index and be a bit faster, but remember,
> in pgsql, it's still got to actually hit the table to see if each
> tuple really is visible to this transaction, so the index needs to
> be fairly selective to be a win.

But well if the query is not selective enough I think the problem is
shared with other DB as well.

> A possible workaround is to have something like a separate table
> with nothing but the IDs and whatever would be in your where clause
> for the tables you're accessing with a foreign key to it, and use
> THAT for a count(*).  Since the rows are skinnier, the count(*)
> will be faster. Another alternative is to have a trigger fire that
> keeps a track of the size of the table in a summary table when rows
> are added and deleted.

If the where clause is dynamic, how can it help?

> select count(*) from bigtable where inserttime > now() - interval
> '1 week'; count
> ---------
>  1254269
> (1 row)
>
> Time: 21422.368 ms
>
> Second run:
> select count(*) from bigtable where inserttime > now() - interval
> '1 week'; count
> ---------
>  1254320
> (1 row)
>
> Time: 2001.204 ms
>
> With the data loaded into shared_buffers / linux kernel cache,
> that's not too bad.
>
> Now, I run the same query against our production oracle machine,
> which is a MUCH more powerful server...
>
> SQL> select count(*) from bigtable where inserttime > SYSDATE-7;
>
>   COUNT(*)
> ----------
>    1255972
>
> Elapsed: 00:00:18.62
>
> second run:
> SQL> select count(*) from bigtable where inserttime > SYSDATE-7;
>
>   COUNT(*)
> ----------
>    1255973
> Elapsed: 00:00:00.98
>
> Now, obviously, Oracle's got some optimizations for what it's got in
> the buffer there, but the first run isn't really any faster. In
> fact, for a much more powerful machine, the performance was,
> relatively speaking, pretty bad compared to my little 1 CPU 1 sw
> RAID-10 reporting server.

Interesting... oh let's put some emotions in it: cool ;)

> So, while PostgreSQL's count(*) performance isn't blindingly fast,
> it's not the dog some people make it out to be either.

Continuing here from your previous post... while you guys are
concerned of competing with big boxes running Oracle or thinking
about Google's webfarms I'm thinking about stuff that is in between a
mere CMS for mortals and stuff that deals with money where you don't
have budgets and hits enough to justify a complex logic to do stats
analysis or collect enough stats to make any forecast reasonable.

In this context a simpler faster even if less accurate count may be
very handy.

thanks again

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


В списке pgsql-general по дате отправления:

Предыдущее
От: "Josh Harrison"
Дата:
Сообщение: Re: quick question abt pg_dump and restore
Следующее
От: "Joshua D. Drake"
Дата:
Сообщение: Re: count(*) and bad design was: Experiences with extensibility