Обсуждение: B-Heaps
Just curious if this would apply to PostgreSQL: http://queue.acm.org/detail.cfm?id=1814327
Now that I've read it, it seems like a no-brainer. So, how does PostgreSQL deal with the different latencies involved in accessing data on disk for searches / sorts vs. accessing data in memory? Is it allocated in a similar way as described in the article such that disk access is reduced to a minimum?
On 15/06/10 06:21, Eliot Gable wrote: > Just curious if this would apply to PostgreSQL: > http://queue.acm.org/detail.cfm?id=1814327 > > <http://queue.acm.org/detail.cfm?id=1814327>Now that I've read it, it seems > like a no-brainer. So, how does PostgreSQL deal with the different latencies > involved in accessing data on disk for searches / sorts vs. accessing data > in memory? Is it allocated in a similar way as described in the article such > that disk access is reduced to a minimum? I don't think we have any binary heap structures that are large enough for this to matter. We use a binary heap when merging tapes in the tuplesort code, for example, but that's tiny. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Eliot Gable wrote: > Just curious if this would apply to > PostgreSQL: http://queue.acm.org/detail.cfm?id=1814327 It's hard to take this seriously at all when it's so ignorant of actual research in this area. Take a look at http://www.cc.gatech.edu/~bader/COURSES/UNM/ece637-Fall2003/papers/BFJ01.pdf for a second, specifically page 9. See the "van Emde Boas" layout? That's basically the same as what this article is calling a B-heap, and the idea goes back to at least 1977. As you can see from that paper, the idea of using it to optimize for multi-level caches goes back to at least 2001. Based on the performance number, it seems a particularly helpful optimization for the type of in-memory caching that his Varnish tool is good at, so kudos for reinventing the right wheel. But that's an environment with one level of cache: you're delivering something from memory, or not. You can't extrapolate from what works for that very far. > So, how does PostgreSQL deal with the different latencies involved in > accessing data on disk for searches / sorts vs. accessing data in > memory? Is it allocated in a similar way as described in the article > such that disk access is reduced to a minimum? PostgreSQL is modeling a much more complicated situation where there are many levels of caches, from CPU to disk. When executing a query, the database tries to manage that by estimating the relative costs for CPU operations, row operations, sequential disk reads, and random disk reads. Those fundamental operations are then added up to build more complicated machinery like sorting. To minimize query execution cost, various query plans are considered, the cost computed for each one, and the cheapest one gets executed. This has to take into account a wide variety of subtle tradeoffs related to whether memory should be used for things that would otherwise happen on disk. There are three primary ways to search for a row, three main ways to do a join, two for how to sort, and they all need to have cost estimates made for them that balance CPU time, memory, and disk access. The problem Varnish is solving is most like how PostgreSQL decides what disk pages to keep in memory, specifically the shared_buffers structure. Even there the problem the database is trying to solve is quite a bit more complicated than what a HTTP cache has to deal with. For details about what the database does there, see "Inside the PostgreSQL Buffer Cache" at http://projects.2ndquadrant.com/talks -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
In response to Greg Smith : > For details about what the database does there, see "Inside the > PostgreSQL Buffer Cache" at http://projects.2ndquadrant.com/talks Nice paper, btw., thanks for that! Regards, Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431 2EB0 389D 1DC2 3172 0C99
On Mon, 14 Jun 2010, Eliot Gable wrote: > Just curious if this would apply to PostgreSQL: > http://queue.acm.org/detail.cfm?id=1814327 Absolutely, and I said in http://archives.postgresql.org/pgsql-performance/2010-03/msg00272.php but applied to the Postgres B-tree indexes instead of heaps. It's a pretty obvious performance improvement really - the principle is that when you do have to fetch a page from a slower medium, you may as well make it count for a lot. Lots of research has already been done on this - the paper linked above is rather behind the times. However, AFAIK, Postgres has not implemented this in any of its indexing systems. Matthew -- An ant doesn't have a lot of processing power available to it. I'm not trying to be speciesist - I wouldn't want to detract you from such a wonderful creature, but, well, there isn't a lot there, is there? -- Computer Science Lecturer
Greg Smith wrote: > Eliot Gable wrote: >> Just curious if this would apply to PostgreSQL: >> http://queue.acm.org/detail.cfm?id=1814327 > > It's hard to take this seriously at all when it's so ignorant of > actual research in this area. Take a look at > http://www.cc.gatech.edu/~bader/COURSES/UNM/ece637-Fall2003/papers/BFJ01.pdf > for a second Interesting paper, thanks for the reference! > PostgreSQL is modeling a much more complicated situation where there > are many levels of caches, from CPU to disk. When executing a query, > the database tries to manage that by estimating the relative costs for > CPU operations, row operations, sequential disk reads, and random disk > reads. Those fundamental operations are then added up to build more > complicated machinery like sorting. To minimize query execution cost, > various query plans are considered, the cost computed for each one, > and the cheapest one gets executed. This has to take into account a > wide variety of subtle tradeoffs related to whether memory should be > used for things that would otherwise happen on disk. There are three > primary ways to search for a row, three main ways to do a join, two > for how to sort, and they all need to have cost estimates made for > them that balance CPU time, memory, and disk access. Do you think that the cache oblivious algorithm described in the paper could speed up index scans hitting the disk Postgres (and os/hardware) multi level memory case? (so e.g. random page cost could go down?) regards, Yeb Havinga
On Tue, Jun 15, 2010 at 8:23 AM, Matthew Wakeling <matthew@flymine.org> wrote: > Absolutely, and I said in > http://archives.postgresql.org/pgsql-performance/2010-03/msg00272.php > but applied to the Postgres B-tree indexes instead of heaps. This is an interesting idea. I would guess that you could simulate this to some degree by compiling PG with a larger block size. Have you tried this to see whether/how much/for what kind of workloads it helps? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Fri, 18 Jun 2010, Robert Haas wrote: > On Tue, Jun 15, 2010 at 8:23 AM, Matthew Wakeling <matthew@flymine.org> wrote: >> Absolutely, and I said in >> http://archives.postgresql.org/pgsql-performance/2010-03/msg00272.php >> but applied to the Postgres B-tree indexes instead of heaps. > > This is an interesting idea. I would guess that you could simulate > this to some degree by compiling PG with a larger block size. Have > you tried this to see whether/how much/for what kind of workloads it > helps? To a degree, that is the case. However, if you follow the thread a bit further back, you will find evidence that when the index is in memory, increasing the page size actually decreases the performance, because it uses more CPU. To make it clear - 8kB is not an optimal page size for either fully cached data or sparsely cached data. For disc access, large pages are appropriate, on the order of 256kB. If the page size is much lower than that, then the time taken to fetch it doesn't actually decrease much, and we are trying to get the maximum amount of work done per fetch without slowing fetches down significantly. Given such a large page size, it would then be appropriate to have a better data structure inside the page. Currently, our indexes (at least the GiST ones - I haven't looked at the Btree ones) use a simple linear array in the index page. Using a proper tree inside the index page would improve the CPU usage of the index lookups. One detail that would need to be sorted out is the cache eviction policy. I don't know if it is best to evict whole 256kB pages, or to evict 8kB pages. Probably the former, which would involve quite a bit of change to the shared memory cache. I can see the cache efficiency decreasing as a result of this, which is the only disadvantage I can see. This sort of thing has been fairly well researched at an academic level, but has not been implemented in that many real world situations. I would encourage its use in Postgres. Matthew -- Failure is not an option. It comes bundled with your Microsoft product. -- Ferenc Mantfeld
Matthew Wakeling wrote: > This sort of thing has been fairly well researched at an academic > level, but has not been implemented in that many real world > situations. I would encourage its use in Postgres. I guess, but don't forget that work on PostgreSQL is driven by what problems people are actually running into. There's a long list of performance improvements sitting in the TODO list waiting for people to find time to work on them, ones that we're quite certain are useful. That anyone is going to chase after any of these speculative ideas from academic research instead of one of those is unlikely. Your characterization of the potential speed up here is "Using a proper tree inside the index page would improve the CPU usage of the index lookups", which seems quite reasonable. Regardless, when I consider "is that something I have any reason to suspect is a bottleneck on common workloads?", I don't think of any, and return to working on one of things I already know is instead. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Greg Smith wrote: > Matthew Wakeling wrote: >> This sort of thing has been fairly well researched at an academic >> level, but has not been implemented in that many real world >> situations. I would encourage its use in Postgres. > > I guess, but don't forget that work on PostgreSQL is driven by what > problems people are actually running into. There's a long list of > performance improvements sitting in the TODO list waiting for people > to find time to work on them, ones that we're quite certain are > useful. That anyone is going to chase after any of these speculative > ideas from academic research instead of one of those is unlikely. > Your characterization of the potential speed up here is "Using a > proper tree inside the index page would improve the CPU usage of the > index lookups", which seems quite reasonable. Regardless, when I > consider "is that something I have any reason to suspect is a > bottleneck on common workloads?", I don't think of any, and return to > working on one of things I already know is instead. > There are two different things concerning gist indexes: 1) with larger block sizes and hence, larger # entries per gist page, results in more generic keys of those pages. This in turn results in a greater number of hits, when the index is queried, so a larger part of the index is scanned. NB this has nothing to do with caching / cache sizes; it holds for every IO model. Tests performed by me showed performance improvements of over 200%. Since then implementing a speedup has been on my 'want to do list'. 2) there are several approaches to get the # entries per page down. Two have been suggested in the thread referred to by Matthew (virtual pages (but how to order these?) and tree within a page). It is interesting to see if ideas from Prokop's cache oblivous algorithms match with this problem to find a suitable virtual page format. regards, Yeb Havinga
Yeb Havinga <yebhavinga@gmail.com> wrote: > concerning gist indexes: > > 1) with larger block sizes and hence, larger # entries per gist > page, results in more generic keys of those pages. This in turn > results in a greater number of hits, when the index is queried, so > a larger part of the index is scanned. NB this has nothing to do > with caching / cache sizes; it holds for every IO model. Tests > performed by me showed performance improvements of over 200%. > Since then implementing a speedup has been on my 'want to do > list'. As I recall, the better performance in your tests was with *smaller* GiST pages, right? (The above didn't seem entirely clear on that.) -Kevin
Greg Smith <greg@2ndquadrant.com> writes: > Your characterization of the potential speed up here is "Using a proper tree > inside the index page would improve the CPU usage of the index lookups", > which seems quite reasonable. Regardless, when I consider "is that > something I have any reason to suspect is a bottleneck on common > workloads?", I don't think of any, and return to working on one of > things I already know is instead. Note also that this doesn't do a thing for b-tree indexes, which already have an intelligent within-page structure. So that instantly makes it not a mainstream issue. Perhaps somebody will be motivated to work on it, but most of us are chasing other things. regards, tom lane
Kevin Grittner wrote: > Yeb Havinga <yebhavinga@gmail.com> wrote: > > >> concerning gist indexes: >> >> 1) with larger block sizes and hence, larger # entries per gist >> page, results in more generic keys of those pages. This in turn >> results in a greater number of hits, when the index is queried, so >> a larger part of the index is scanned. NB this has nothing to do >> with caching / cache sizes; it holds for every IO model. Tests >> performed by me showed performance improvements of over 200%. >> Since then implementing a speedup has been on my 'want to do >> list'. >> > > As I recall, the better performance in your tests was with *smaller* > GiST pages, right? (The above didn't seem entirely clear on that.) > Yes, making pages smaller made index scanning faster. -- Yeb
On Fri, Jun 18, 2010 at 1:53 PM, Greg Smith <greg@2ndquadrant.com> wrote: > Matthew Wakeling wrote: >> >> This sort of thing has been fairly well researched at an academic level, >> but has not been implemented in that many real world situations. I would >> encourage its use in Postgres. > > I guess, but don't forget that work on PostgreSQL is driven by what problems > people are actually running into. There's a long list of performance > improvements sitting in the TODO list waiting for people to find time to > work on them, ones that we're quite certain are useful. That anyone is > going to chase after any of these speculative ideas from academic research > instead of one of those is unlikely. Your characterization of the potential > speed up here is "Using a proper tree inside the index page would improve > the CPU usage of the index lookups", which seems quite reasonable. > Regardless, when I consider "is that something I have any reason to suspect > is a bottleneck on common workloads?", I don't think of any, and return to > working on one of things I already know is instead. This is drifting a bit off-topic for this thread, but it's not so easy to figure out from looking at the TODO which things are actually important. Performance-related improvements are mixed in with non-performance related improvements, which are mixed in with things that are probably not improvements at all. And even to the extent that you can identify the stuff that's performance-related, it's far from obvious which things are most important. Any thoughts on that? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas wrote: > This is drifting a bit off-topic for this thread, but it's not so easy > to figure out from looking at the TODO which things are actually > important. Performance-related improvements are mixed in with > non-performance related improvements, which are mixed in with things > that are probably not improvements at all. And even to the extent > that you can identify the stuff that's performance-related, it's far > from obvious which things are most important. Any thoughts on that I don't think it's off topic at all actually, and as usually I'll be happy to argue why. Reorganizing the TODO so that it's easier for newcomers to consume is certainly a worthwhile but hard to "fund" (find time to do relative to more important things) effort itself. My point was more that statistically, *anything* on that list is likely a better candidate for something to work on usefully than one of the random theoretical performance improvements from research that pop on the lists from time to time. People get excited about these papers and blog posts sometimes, but the odds of those actually being in the critical path where it represents a solution to a current PostgreSQL bottleneck is dramatically lower than that you'll find one reading the list of *known* issues. Want to improve PostgreSQL performance? Spend more time reading the TODO, less looking around elsewhere for problems the database may or may not have. I have a major time sink I'm due to free myself from this week, and the idea of providing some guidance for a "low hanging performance fruit" section of the TODO is a good one I should take a look at. I have a personal list of that sort already I should probably just make public, since the ideas for improving things are not the valuable part I should worry about keeping private anyway. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us