Re: How does PG know if data is in memory?

Поиск
Список
Период
Сортировка
От
Тема Re: How does PG know if data is in memory?
Дата
Msg-id 20101011231134.ANB76937@ms14.lnh.mail.rcn.net
обсуждение исходный текст
Ответ на Re: How does PG know if data is in memory?  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: How does PG know if data is in memory?  (Robert Haas <robertmhaas@gmail.com>)
Re: How does PG know if data is in memory?  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Список pgsql-performance
An approach that works can be found in DB2, and likely elsewhere.

The key is that tablespaces/tables/indexes/buffers are all attached through the bufferpool (the DB2 term).  A
tablespace/bufferpoolmatch is defined.  Then tables and indexes are assigned to the tablespace (and implicitly, the
bufferpool). As a result, one can effectively pin data in memory.  This is very useful, but not low hanging fruit to
implement.

The introduction of rudimentary tablespaces is a first step.  I assumed that the point was to get to a DB2-like
structureat some point.  Yes? 

Robert

---- Original message ----
>Date: Mon, 11 Oct 2010 22:59:28 -0400
>From: pgsql-performance-owner@postgresql.org (on behalf of Robert Haas <robertmhaas@gmail.com>)
>Subject: Re: [PERFORM] How does PG know if data is in memory?
>To: Jeremy Harris <jgh@wizmail.org>
>Cc: pgsql-performance@postgresql.org
>
>On Mon, Oct 4, 2010 at 6:47 PM, Jeremy Harris <jgh@wizmail.org> wrote:
>> On 10/04/2010 04:22 AM, Greg Smith wrote:
>>>
>>> I had a brain-storming session on this subject with a few of the hackers
>>> in the community in this area a while back I haven't had a chance to do
>>> something with yet (it exists only as a pile of scribbled notes so far).
>>> There's a couple of ways to collect data on what's in the database and OS
>>> cache, and a couple of ways to then expose that data to the optimizer. But
>>> that needs to be done very carefully, almost certainly as only a manual
>>> process at first, because something that's producing cache feedback all of
>>> the time will cause plans to change all the time, too. Where I suspect this
>>> is going is that we may end up tracking various statistics over time, then
>>> periodically providing a way to export a mass of "typical % cached" data
>>> back to the optimizer for use in plan cost estimation purposes. But the idea
>>> of monitoring continuously and always planning based on the most recent data
>>> available has some stability issues, both from a "too many unpredictable
>>> plan changes" and a "ba
>>
>> d
>>>
>>> short-term feedback loop" perspective, as mentioned by Tom and Kevin
>>> already.
>>
>> Why not monitor the distribution of response times, rather than "cached" vs.
>> not?
>>
>> That a) avoids the issue of discovering what was a cache hit  b) deals
>> neatly with
>> multilevel caching  c) feeds directly into cost estimation.
>
>I was hot on doing better cache modeling a year or two ago, but the
>elephant in the room is that it's unclear that it solves any
>real-world problem.  The OP is clearly having a problem, but there's
>not enough information in his post to say what is actually causing it,
>and it's probably not caching effects.  We get occasional complaints
>of the form "the first time I run this query it's slow, and then after
>that it's fast" but, as Craig Ringer pointed out upthread, not too
>many.  And even with respect to the complaints we do get, it's far
>from clear that the cure is any better than the disease.  Taking
>caching effects into account could easily result in the first
>execution being slightly less slow and all of the subsequent
>executions being moderately slow.  That would not be an improvement
>for most people.  The reports that seem really painful to me are the
>ones where people with really big machines complain of needing HOURS
>for the cache to warm up, and having the system bogged down to a
>standstill until then.  But changing the cost model isn't going to
>help them either.
>
>--
>Robert Haas
>EnterpriseDB: http://www.enterprisedb.com
>The Enterprise PostgreSQL Company
>
>--
>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance

В списке pgsql-performance по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: How does PG know if data is in memory?
Следующее
От: Neil Whelchel
Дата:
Сообщение: Re: Slow count(*) again...