Re: Postgresql Caching

Поиск
Список
Период
Сортировка
От Harvell F
Тема Re: Postgresql Caching
Дата
Msg-id ACE29058-CD5E-41D2-A356-FA140A1CA4EA@file13.info
обсуждение исходный текст
Ответ на Re: Postgresql Caching  (Shane Ambler <pgsql@007Marketing.com>)
Список pgsql-hackers
On 16 Oct 2006, at 4:29, Shane Ambler wrote:

> Harvell F wrote:
>
>>   Getting back to the original posting, as I remember it, the  
>> question was about seldom changed information.  In that case, and  
>> assuming a repetitive query as above, a simple query results cache  
>> that is keyed on the passed SQL statement string and that simply  
>> returns the previously cooked result set would be a really big  
>> performance win.
>
> I believe the main point that Mark made was the extra overhead is  
> in the sql parsing and query planning - this is the part that  
> postgres won't get around. Even if you setup simple tables for  
> caching it still goes through the parser and planner and looses the  
> benefits that memcached has. Or you fork those requests before the  
> planner and loose the benefits of postgres.
> The main benefit of using memcached is to bypass the parsing and  
> query planning.
  That was the basis of my suggestion to just use the passed query  
string as the key.  No parsing or processing of the query, just a  
simple string match.

> You will find there is more to sql parsing than you first think, it  
> needs to find the components that make up the sql statement (tables  
> column names functions) and check that they exist and can be used  
> in the context of the given sql and the given data matches the  
> context that is given to be used in, it needs to check that the  
> current user has enough privileges to perform the requested task,  
> then it locates the data whether it be in the memory cache, on disk  
> or an integrated version of memcached, this would also include  
> checks to make sure another user hasn't locked the data to change  
> it and whether there exists more than one version of the data,  
> committed and uncommitted and then sends the results back to the  
> client requesting it.
  The user permissions checking is a potential issue but again, for  
the special case of repeated queries by the same user (the webserver  
process) for the same data, a simple match of the original query  
string _and_ the original query user, would still be very simple.   
The big savings by having the simple results cache would be the  
elimination of the parsing, planning, locating, combining, and  
sorting of the results set.
  I don't believe normal locking plays a part in the cache (there  
are basic cache integrity locking issues though) nor does the  
versioning or commit states, beyond the invalidation of the cache  
upon a commit to a referenced table.  It may be that the invalidation  
needs to happen whenever a table is locked as well.  (The hooks for  
the invalidation would be done during the original caching of the  
results set.)
  I know that the suggestion is a very simple minded suggestion and  
is limited to a very small subset of the potential query types and  
interactions, however, at least for web applications, it would be a  
very big win.  Many website want to display today's data on their  
webpage and have it change as dates change (or as users change).  The  
data in the source table doesn't change very often (especially  
compared to a popular website) and the number of times that the exact  
same query could be issued between changes can measure into the  
hundreds of thousands or more.  Putting even this simple results  
cache into the database would really simplify the programmer's life  
and improve reliability (and the use of PostgreSQL).




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [BUGS] BUG #2683: spi_exec_query in plperl returns
Следующее
От: Neil Conway
Дата:
Сообщение: Re: Postgresql Caching