Обсуждение: Re: [NOVICE] Aggregates and Indexes
Adam, > I do a lot of reporting based on such SQL statements. Counting rows from > large datasets. Since the PG gurus don't seem to think this is such a big > deal can someone enlighten me as to why? I am not a core developer, but I will take a crack at your question anyway based on my personal knowledge. I am sure that Tom, Bruce, or Stephan will correct my mistaken assumptions. Actually, the inability to index MAX, MIN, and COUNT is an annoyance I think everyone would like to fix; it's just that the technical challenge is much greater than the benefit for the core team. If you know a programmer who wants to tackle it, go ahead. Presumably you've already read the many previous e-mails on why it is a techincal challenge. Now, even if that challenge were solved, indexing for aggregates would still be of limited usefulness because: Few (if any) RDBMSs can index for SUM, STDEV, or other "calcuation" aggregates. This is because the value of every single record must be incuded and estimates are not possible, so the performance gain from using an index is infinitessimal except for those RDBMSs with very slow file access times. For Postgres custom aggregates, using a standard index is impossible, for reasons I think are obvious. That leaves MAX, MIN, and COUNT. All of these aggregates should, in an ideal world, be index-responsive for large data sets. Once again, for small data sets or subsets, indexes are not useful. And there is a workaround for Min and Max. So what we'd be looking at is either developing a special parser routine for MIN, MAX, and COUNT (and them only) just to index for those aggregates, or coming up with a new type of index just for aggregates. The first approach is a bit of kludge that would require a lot of debugging; the second is probably the best long-term solution, but would require a great deal of innovation. -- -Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: > For Postgres custom aggregates, using a standard index is impossible, for > reasons I think are obvious. > That leaves MAX, MIN, and COUNT. All of these aggregates should, in an > ideal world, be index-responsive for large data sets. While it's fairly clear how one might use an index for MAX/MIN (basically, make the optimizer transform it into a SELECT ... ORDER BY ... LIMIT 1 operation, which can then be done with an indexscan), I really don't see how an index can help for COUNT. The real problem with COUNT is that any attempt to maintain such a value on-the-fly creates a single-point bottleneck for all insertions and deletions on the table. The perspective of most of the developers is that that cost outweighs any possible savings from having an instantaneous COUNT operation. When you add in the issue that under MVCC there isn't a unique COUNT that's the same for all transactions, it's just not worth thinking about. (And do I need to point out that with WHERE conditions, GROUP BY, or a variable COUNT argument, all hope of such optimization disappears anyway? A global rowcount doesn't help in those cases.) The MAX/MIN issue will probably be addressed someday, but since there is a good workaround it's not very high on anyone's TODO queue. We have many more-pressing problems. regards, tom lane
Josh Berkus wrote: > > Adam, > > > I do a lot of reporting based on such SQL statements. Counting rows from > > large datasets. Since the PG gurus don't seem to think this is such a big > > deal can someone enlighten me as to why? > > I am not a core developer, but I will take a crack at your question anyway > based on my personal knowledge. I am sure that Tom, Bruce, or Stephan will > correct my mistaken assumptions. I have seen a few mentions in the past weeks about core vs. non-core developers. I should reiterate that the core group feels there is no distinction between the opinions of the core people and the other major developers. Everyone gets just one vote. The only reason for core is to deal with certain "sensitive" issues that can't be dealt with in public forums, and making _big_ decisions like should we release on Friday or Monday. ;-) -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce, > I have seen a few mentions in the past weeks about core vs. non-core > developers. I should reiterate that the core group feels there is no > distinction between the opinions of the core people and the other > major > developers. Everyone gets just one vote. Which is why it's important that us peripheral developers make sure you get the respect you deserve, since you're so un-assertive. <grin> Why I often make a comment like that in my messages is that, as a strictly *documentation* contributor, I often make mistaken assumptions about why some things in Postgres work the way they do. Heck, even when I'm right, I'm usually just parroting something you, Tom, or Stephan have told me. Or, to put it another way: given the number of questions I field on this list, and the stuff I write for Techdocs and elsewhere, there is a tendency from the public to assume that I am a "big wheel" in the PostgreSQL project. Which I am not. I've been here for a while, yes, but by no means do my contributions equal in scope, importance, or dedication the contributions of you, Tom, Stephan, Peter, Thomas, Joe, or two dozen other contributors whose names don't spring immediately to mind. Further, I do not consult with the PostgreSQL "global development team" before mouthing off; my opinions are mine, not those of the postgresql project. Thus, the comment in so many of my e-mails of "I am not a core developer." Capisce? -Josh Berkus
Oh, sure I understand. There were a number of people mentioning the core issue. What the core/major folks do is more give information about historical or practical ideas behind certain issues, so in that way they do have a strong voice, but only in the way their ideas effect other people's opinions and votes. --------------------------------------------------------------------------- Josh Berkus wrote: > Bruce, > > > I have seen a few mentions in the past weeks about core vs. non-core > > developers. I should reiterate that the core group feels there is no > > distinction between the opinions of the core people and the other > > major > > developers. Everyone gets just one vote. > > Which is why it's important that us peripheral developers make sure you > get the respect you deserve, since you're so un-assertive. <grin> > > Why I often make a comment like that in my messages is that, as a > strictly *documentation* contributor, I often make mistaken assumptions > about why some things in Postgres work the way they do. Heck, even > when I'm right, I'm usually just parroting something you, Tom, or > Stephan have told me. > > Or, to put it another way: given the number of questions I field on > this list, and the stuff I write for Techdocs and elsewhere, there is a > tendency from the public to assume that I am a "big wheel" in the > PostgreSQL project. Which I am not. I've been here for a while, > yes, but by no means do my contributions equal in scope, importance, or > dedication the contributions of you, Tom, Stephan, Peter, Thomas, Joe, > or two dozen other contributors whose names don't spring immediately to > mind. > > Further, I do not consult with the PostgreSQL "global development team" > before mouthing off; my opinions are mine, not those of the postgresql > project. > > Thus, the comment in so many of my e-mails of "I am not a core > developer." > > Capisce? > > -Josh Berkus > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026