Обсуждение: Re: [NOVICE] Aggregates and Indexes

Поиск
Список
Период
Сортировка

Re: [NOVICE] Aggregates and Indexes

От
Josh Berkus
Дата:
Adam,

> I do a lot of reporting based on such SQL statements.  Counting rows from
> large datasets.  Since the PG gurus don't seem to think this is such a big
> deal can someone enlighten me as to why?

I am not a core developer, but I will take a crack at your question anyway
based on my personal knowledge.  I am sure that Tom, Bruce, or Stephan will
correct my mistaken assumptions.

Actually, the inability to index MAX, MIN, and COUNT is an annoyance I think
everyone would like to fix; it's just that the technical challenge is much
greater than the benefit for the core team.   If you know a programmer who
wants to tackle it, go ahead.

Presumably you've already read the many previous e-mails on why it is a
techincal challenge.

Now, even if that challenge were solved, indexing for aggregates would still
be of limited usefulness because:

Few (if any) RDBMSs can index for SUM, STDEV, or other "calcuation"
aggregates.  This is because the value of every single record must be incuded
and estimates are not possible, so the performance gain from using an index
is infinitessimal except for those RDBMSs with very slow file access times.

For Postgres custom aggregates, using a standard index is impossible, for
reasons I think are obvious.

That leaves MAX, MIN, and COUNT.    All of these aggregates should, in an
ideal world, be index-responsive for large data sets.  Once again, for small
data sets or subsets, indexes are not useful.   And there is a workaround for
Min and Max.

So what we'd be looking at is either developing a special parser routine for
MIN, MAX, and COUNT (and them only) just to index for those aggregates, or
coming up with a new type of index just for aggregates.   The first approach
is a bit of kludge that would require a lot of debugging; the second is
probably the best long-term solution, but would require a great deal of
innovation.


--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: [NOVICE] Aggregates and Indexes

От
Tom Lane
Дата:
Josh Berkus <josh@agliodbs.com> writes:
> For Postgres custom aggregates, using a standard index is impossible, for
> reasons I think are obvious.

> That leaves MAX, MIN, and COUNT.    All of these aggregates should, in an
> ideal world, be index-responsive for large data sets.

While it's fairly clear how one might use an index for MAX/MIN
(basically, make the optimizer transform it into a SELECT ... ORDER BY
... LIMIT 1 operation, which can then be done with an indexscan),
I really don't see how an index can help for COUNT.

The real problem with COUNT is that any attempt to maintain such a value
on-the-fly creates a single-point bottleneck for all insertions and
deletions on the table.  The perspective of most of the developers is
that that cost outweighs any possible savings from having an
instantaneous COUNT operation.

When you add in the issue that under MVCC there isn't a unique COUNT
that's the same for all transactions, it's just not worth thinking
about.  (And do I need to point out that with WHERE conditions,
GROUP BY, or a variable COUNT argument, all hope of such optimization
disappears anyway?  A global rowcount doesn't help in those cases.)

The MAX/MIN issue will probably be addressed someday, but since there
is a good workaround it's not very high on anyone's TODO queue.  We have
many more-pressing problems.

            regards, tom lane

Re: [NOVICE] Aggregates and Indexes

От
Bruce Momjian
Дата:
Josh Berkus wrote:
>
> Adam,
>
> > I do a lot of reporting based on such SQL statements.  Counting rows from
> > large datasets.  Since the PG gurus don't seem to think this is such a big
> > deal can someone enlighten me as to why?
>
> I am not a core developer, but I will take a crack at your question anyway
> based on my personal knowledge.  I am sure that Tom, Bruce, or Stephan will
> correct my mistaken assumptions.

I have seen a few mentions in the past weeks about core vs. non-core
developers.  I should reiterate that the core group feels there is no
distinction between the opinions of the core people and the other major
developers.  Everyone gets just one vote.

The only reason for core is to deal with certain "sensitive" issues that
can't be dealt with in public forums, and making _big_ decisions like
should we release on Friday or Monday.  ;-)

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: [NOVICE] Aggregates and Indexes

От
"Josh Berkus"
Дата:
Bruce,

> I have seen a few mentions in the past weeks about core vs. non-core
> developers.  I should reiterate that the core group feels there is no
> distinction between the opinions of the core people and the other
> major
> developers.  Everyone gets just one vote.

Which is why it's important that us peripheral developers make sure you
get the respect you deserve, since you're so un-assertive. <grin>

Why I often make a comment like that in my messages is that, as a
strictly *documentation* contributor, I often make mistaken assumptions
about why some things in Postgres work the way they do.   Heck, even
when I'm right, I'm usually just parroting something you, Tom, or
Stephan have told me.

Or, to put it another way:  given the number of questions I field on
this list, and the stuff I write for Techdocs and elsewhere, there is a
tendency from the public to assume that I am a "big wheel" in the
PostgreSQL project.   Which I am not.   I've been here for a while,
yes, but by no means do my contributions equal in scope, importance, or
dedication the contributions of you, Tom, Stephan, Peter, Thomas, Joe,
or two dozen other contributors whose names don't spring immediately to
mind.

Further, I do not consult with the PostgreSQL "global development team"
before mouthing off; my opinions are mine, not those of the postgresql
project.

Thus, the comment in so many of my e-mails of "I am not a core
developer."

Capisce?

-Josh Berkus

Re: [NOVICE] Aggregates and Indexes

От
Bruce Momjian
Дата:
Oh, sure I understand.  There were a number of people mentioning the
core issue.  What the core/major folks do is more give information about
historical or practical ideas behind certain issues, so in that way they
do have a strong voice, but only in the way their ideas effect other
people's opinions and votes.

---------------------------------------------------------------------------

Josh Berkus wrote:
> Bruce,
>
> > I have seen a few mentions in the past weeks about core vs. non-core
> > developers.  I should reiterate that the core group feels there is no
> > distinction between the opinions of the core people and the other
> > major
> > developers.  Everyone gets just one vote.
>
> Which is why it's important that us peripheral developers make sure you
> get the respect you deserve, since you're so un-assertive. <grin>
>
> Why I often make a comment like that in my messages is that, as a
> strictly *documentation* contributor, I often make mistaken assumptions
> about why some things in Postgres work the way they do.   Heck, even
> when I'm right, I'm usually just parroting something you, Tom, or
> Stephan have told me.
>
> Or, to put it another way:  given the number of questions I field on
> this list, and the stuff I write for Techdocs and elsewhere, there is a
> tendency from the public to assume that I am a "big wheel" in the
> PostgreSQL project.   Which I am not.   I've been here for a while,
> yes, but by no means do my contributions equal in scope, importance, or
> dedication the contributions of you, Tom, Stephan, Peter, Thomas, Joe,
> or two dozen other contributors whose names don't spring immediately to
> mind.
>
> Further, I do not consult with the PostgreSQL "global development team"
> before mouthing off; my opinions are mine, not those of the postgresql
> project.
>
> Thus, the comment in so many of my e-mails of "I am not a core
> developer."
>
> Capisce?
>
> -Josh Berkus
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026