Re: Yet Another COUNT(*)...WHERE...question

Поиск
Список
Период
Сортировка
От Scott Marlowe
Тема Re: Yet Another COUNT(*)...WHERE...question
Дата
Msg-id dcc563d10708150824q2ed10e27g6457dcec8cf14671@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Yet Another COUNT(*)...WHERE...question  ("Phoenix Kiula" <phoenix.kiula@gmail.com>)
Ответы Re: Yet Another COUNT(*)...WHERE...question
Re: Yet Another COUNT(*)...WHERE...question
Список pgsql-general
On 8/15/07, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> On 15/08/07, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> > On 8/15/07, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> > > On 15/08/07, Gregory Stark <stark@enterprisedb.com> wrote:
> > > > "Phoenix Kiula" <phoenix.kiula@gmail.com> writes:
> > > >
> > > > > I'm grappling with a lot of reporting code for our app that relies on
> > > > > queries such as:
> > > > >
> > > > >      SELECT COUNT(*) FROM TABLE WHERE ....(conditions)...
> > > > >...
> > > > > The number of such possibilities for multiple WHERE conditions is
> > > > > infinite...
> > > >
> > > > Depends on the "conditions" bit. You can't solve all of the infinite
> > > > possibilities -- well you can, just run the query above -- but if you want > to do better it's all about
understandingyour data. 
> > >
> > >
> > > I am not sure what the advice here is. The WHERE condition comes from
> > > the indices. So if the query was not "COUNT(*)" but just a couple of
> > > columns, the query executes in less than a second. Just that COUNT(*)
> > > becomes horribly slow.
> >
> > Sorry, but I don't believe you.  if you're doing a count(*) on the
> > same dataset that returns in < 1 second, then the count(*) with the
> > same where clause will run in < 1 second.  I haven't seen pgsql do
> > anything else.
>
>
>
> Sorry I was not clear. Imagine an Amazon.com search results page. It
> has about 15 results on Page 1, then it shows "Page 1 of 190".
>
> To show each page, the query probably has a "LIMIT 15 OFFSET 0" for
> Page 1. However, to calculate the total number of pages, they probably
> do a separate counts query, because doing a "select *" and then
> counting the number of rows returned would be even more inefficient
> than a count(*).

When I go to amazon.com I only ever get three pages of results.  ever.
 Because they know that returning 190 pages is not that useful, as
hardly anyone is going to wander through that many pages.

Google, you'll notice says "Results 1 - 10 of about 5,610,000 for
blacksmith"  i.e. it's guesstimating as well.  no reason for google to
look at every single row for blacksmith to know that there's about 5.6
million.

> So, in reporting, two queries are fairly common I would think, unless
> I am missing something?

Yes, optimization. :)  You don't need an exact count to tell someone
that there's more data and they can go to it.  Note that if you are
planning on doing things google sized, you'll need to do what they
did, invent your own specialized database.

For us mere mortals, it's quite likely that you can do something like:

explain select * from table where field like 'abc%';

and then parse the explain output for an approximate number.

В списке pgsql-general по дате отправления:

Предыдущее
От: "Phoenix Kiula"
Дата:
Сообщение: Customizing psql console to show execution times
Следующее
От: Richard Broersma Jr
Дата:
Сообщение: Re: Yet Another COUNT(*)...WHERE...question