Re: Postgresql Caching

Поиск
Список
Период
Сортировка
От Harvell F
Тема Re: Postgresql Caching
Дата
Msg-id 24C93EEC-0509-4EF5-90CD-99C1D836E56C@file13.info
обсуждение исходный текст
Ответ на Re: Postgresql Caching  (mark@mark.mielke.cc)
Ответы Re: Postgresql Caching  (Shane Ambler <pgsql@007Marketing.com>)
Список pgsql-hackers
On 15 Oct 2006, at 19:55, mark@mark.mielke.cc wrote:

> On Mon, Oct 16, 2006 at 07:00:20AM +0930, Shane Ambler wrote:
>> mark@mark.mielke.cc wrote:
>>> As a thought experiment, I'm not seeing the benefit. I think if you
>>> could prove a benefit, then any proof you provided could be used to
>>> improve the already existing caching layers, and would apply equally
>>> to read-only or read-write pages. For example, why not be able to
>>> hint to PostgreSQL that a disk-based table should be considered a
>>> priority to keep in RAM. That way, PostgreSQL would avoid pushing
>>> pages from this table out.
>> If memcached (or pgmemcached implemented in triggers) can show a  
>> speed
>> improvement using ram based caching (even with network overhead) of
>> specific data then it stands to reason that this ram based cache  
>> can be
>> integrated into postgres with better integration that will  
>> overcome the
>> issues that pgmemcached has.
>
> ...
> I think the memcache people are thinking that the cost of  
> PostgreSQL is
> about the disk. Although the disk plays a part, I'm pretty sure it's
> only a fraction. Not providing transaction guarantees, not  
> providing an
> SQL level abstraction, and not having multiple processes or threads
> plays a much bigger part.
>
  Forgive my intrusion and perhaps simplistic viewpoint, however,  
improved caching would be of great benefit for me as a web developer.
  I wholeheartedly agree that the disk IO is often a small part of  
the expense of obtaining data from the database, especially for the  
nominal web based application.  Query parsing, joining, sorting, etc.  
are all likely to be real culprits.  The existing caching mechanism  
(as I understand them) and especially the kernel disk caches do  
nothing to eliminate these overhead costs.
  I would venture that the 80/20 rule applies here as in many, many  
other instances.  A full 80+% of the queries performed against the  
database are performed over and over and over again with the same  
criteria for a period of time and then the criteria changes for the  
next period of time.  This would be particularly true for seldom  
changed tables that, for example, contain a list of the day's  
advertisements.  The data is changed slowly, once a day or once a  
week, but, a query is made for every page hit.  Usually the exact  
same query.
  I know, for you purists out there, that this is an obvious call  
for an application level cache.  Perhaps so, however, it complicates  
the end-programmer environment _and_ it has consistency  
disadvantages.  Many of the programming languages being used provide  
direct interfaces to PostgreSQL (not surprising given that the  
programmers are using PostgreSQL) and some may even provide a caching  
mechanism.  Best case, integrating the two remains a task for the end- 
programmer, worse case, the end-programmer has to implement the cache  
as well.  Rolling a cache into the database removes that complexity  
by incorporating it into the existing PostgreSQL API.  (BTW, I'm  
aware that the consistency disadvantages of the application level  
cache could probably be overcome by implementing notify in the cache  
but, again, at added end-programmer expense.)
  Getting back to the original posting, as I remember it, the  
question was about seldom changed information.  In that case, and  
assuming a repetitive query as above, a simple query results cache  
that is keyed on the passed SQL statement string and that simply  
returns the previously cooked result set would be a really big  
performance win.
  Registering each cache entry by the tables included in the query  
and invalidating the cache during on a committed update or insert  
transaction to any of the tables would, transparently, solve the  
consistency problem.
  Does this improve the "more interesting" case of heavily updated  
tables?  Not likely, however, for many web applications, it will  
likely improve 80% of the queries leaving more cycles (and bandwidth)  
available for the non-cacheable queries.
  There would be other issues as well, for example, non-invalidated  
cache entries will accumulate rapidly if the criteria changes often,  
large result sets will cause cache contention, cursors will (likely)  
be unable to use the cache, syntax/commands for manipulating  
cacheability, etc.  THIS DOES NOT ELIMINATE THE BASIC VALUE of a  
results cache for the conditions specified above.  Conditions that I  
would venture to say make up a large part of the queries that are (or  
could be) made by a web application.

Thanks,  F

--
F Harvell
407 467-1919





В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jeremy Drake
Дата:
Сообщение: Re: Postgresql Caching
Следующее
От: Jeremy Drake
Дата:
Сообщение: constraints in query plans