Re: B-Heaps

Поиск
Список
Период
Сортировка
От Greg Smith
Тема Re: B-Heaps
Дата
Msg-id 4C17206B.9030606@2ndquadrant.com
обсуждение исходный текст
Ответ на B-Heaps  (Eliot Gable <egable+pgsql-performance@gmail.com>)
Ответы Re: B-Heaps
Re: B-Heaps
Список pgsql-performance
Eliot Gable wrote:
> Just curious if this would apply to
> PostgreSQL: http://queue.acm.org/detail.cfm?id=1814327

It's hard to take this seriously at all when it's so ignorant of actual
research in this area.  Take a look at
http://www.cc.gatech.edu/~bader/COURSES/UNM/ece637-Fall2003/papers/BFJ01.pdf
for a second, specifically page 9.  See the "van Emde Boas" layout?
That's basically the same as what this article is calling a B-heap, and
the idea goes back to at least 1977.  As you can see from that paper,
the idea of using it to optimize for multi-level caches goes back to at
least 2001.  Based on the performance number, it seems a particularly
helpful optimization for the type of in-memory caching that his Varnish
tool is good at, so kudos for reinventing the right wheel.  But that's
an environment with one level of cache:  you're delivering something
from memory, or not.  You can't extrapolate from what works for that
very far.

> So, how does PostgreSQL deal with the different latencies involved in
> accessing data on disk for searches / sorts vs. accessing data in
> memory? Is it allocated in a similar way as described in the article
> such that disk access is reduced to a minimum?

PostgreSQL is modeling a much more complicated situation where there are
many levels of caches, from CPU to disk.  When executing a query, the
database tries to manage that by estimating the relative costs for CPU
operations, row operations, sequential disk reads, and random disk
reads.  Those fundamental operations are then added up to build more
complicated machinery like sorting.  To minimize query execution cost,
various query plans are considered, the cost computed for each one, and
the cheapest one gets executed.  This has to take into account a wide
variety of subtle tradeoffs related to whether memory should be used for
things that would otherwise happen on disk.  There are three primary
ways to search for a row, three main ways to do a join, two for how to
sort, and they all need to have cost estimates made for them that
balance CPU time, memory, and disk access.

The problem Varnish is solving is most like how PostgreSQL decides what
disk pages to keep in memory, specifically the shared_buffers
structure.  Even there the problem the database is trying to solve is
quite a bit more complicated than what a HTTP cache has to deal with.
For details about what the database does there, see "Inside the
PostgreSQL Buffer Cache" at http://projects.2ndquadrant.com/talks

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


В списке pgsql-performance по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: B-Heaps
Следующее
От: "A. Kretschmer"
Дата:
Сообщение: Re: B-Heaps