Re: The science of optimization in practical terms?

Поиск
Список
Период
Сортировка
От Greg Smith
Тема Re: The science of optimization in practical terms?
Дата
Msg-id Pine.GSO.4.64.0902151238070.1312@westnet.com
обсуждение исходный текст
Ответ на Re: The science of optimization in practical terms?  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: The science of optimization in practical terms?
Re: The science of optimization in practical terms?
Список pgsql-hackers
On Fri, 13 Feb 2009, Robert Haas wrote:

> Gather statistics on relation access patterns and use that to estimate 
> the fraction of a relation likely to be in cache.

At one point I had a hacked background writer that collected statistics 
about the contents of the buffer cache.  Since it's obtaining a lock on 
the buffer header anyway, it's a perfectly good place to note what 
relfileid the buffer is associated with.  If you set aside some fixed 
amount of space to hold information about the most popular relations 
(perhaps using a continuous top-k model, see 
http://www.mysmu.edu/faculty/kyriakos/topk-SIGMOD06.pdf ), you can end up 
with enough data to estimate how much data in shared_buffers exists for 
the most cached relations in there.

In a typical recommended tuning nowadays, we can only expect that to 
sample about 1/3 of the total caching happening (presuming 
shared_buffers=1/4 RAM and effective_cache_size~=3/4 RAM).  While in 
general it's nice to think that shared_buffers has a similar makeup to 
what the OS is caching, it's not hard to discover common cases where this 
would not be the case.  Particularly given the VACUUM/seq scan ring-buffer 
improvements in 8.3, it's easy to imagine scanning a table that's 
2*shared_buffers in size showing only 256KB in shared_buffers, while the 
whole thing is available in the OS cache.

I had a eureka moment where I realized I could hook the buffer eviction 
code to model that.  Decrement the count for that relation in the main 
top-k count, then have a second count that assumes the last 
2*shared_buffers evicted are also still cached.  That would accurately 
model the ring-buffer case and improve the quality of the model in 
general.  Updating those stats on every eviction would add some overhead, 
but if the background writer is doing enough of them for you that should 
at least be asynchronous from when most backends are blocked waiting for 
an eviction.

And that's as far as I got before I had to return to real work again.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [COMMITTERS] pgsql: Explicitly bind gettext to the correct encoding on Windows.
Следующее
От: "Kevin Grittner"
Дата:
Сообщение: Re: The science of optimization in practical terms?