Обсуждение: Custom cache implemented in a postgresql C function

Поиск
Список
Период
Сортировка

Custom cache implemented in a postgresql C function

От
Gabi Julien
Дата:
Hi,

Here is my problem: I have a postgresql C function that looks like this:

Datum filter(PG_FUNCTION_ARGS);

It takes identifiers and queries a bunch of tables and ends up returning true or false. So far nothing difficult except
thatwe want better performance. The function was already optimized to the best of my abilities and changing the
structureof the database would not help. However, having a cache would be the perfect solution. I could implement this
cacheoutside of postgresql if need be but nothing could beat implementing this directly in a postgresql C function. 

So this is what I want, a custom cache built into a postgresql C function. Since postgresql uses different processes,
itwould be best to use the shared memory. Can this be done safely? At its core, the cache could be considered as simple
asa map protected by a mutex. With postgresql, I first need to initialized some shared memory. This is explained at the
endof this link: 

http://www.postgresql.org/docs/8.2/static/xfunc-c.html

However, it sounds like I need to reserve the shared memory in advance using:

void RequestAddinShmemSpace(int size)

In my case, I do not know how big my cache will be. I would preferably allocate the memory dynamically. Is this
possible?In any case, am I trying to reinvent the wheel here? Is there already a shared map or a shared hash structure
availablein postgresql? 

If shared memory turns out too difficult to use, I could create separate caches for each postgresql processes. This
wouldbe a waste of space but it might be better then nothing. In this case, do I need to make my code thread safe? In
otherwords, is postgresql using more then one thread per processes? 

Any insights would be more then welcome!
Thank you,
Gabi Julien

Re: Custom cache implemented in a postgresql C function

От
Rob Sargent
Дата:
Are you sure you cache needs to grow endlessly?  Otherwise you could use
RequestAddinShmemSpace and manage you're map within that space, perhaps
"overwriting" chunks on an LRU basis or a rollover. i.e. Grab it all and
do your own management within that single block of shmem.
Caches are best for thing revisited often, so old/unused ought to be
expendable with little performance loss, at least compared with the
heavy traffic.

On 10/20/2010 05:44 PM, Gabi Julien wrote:
> Hi,
>
> Here is my problem: I have a postgresql C function that looks like this:
>
> Datum filter(PG_FUNCTION_ARGS);
>
> It takes identifiers and queries a bunch of tables and ends up returning true or false. So far nothing difficult
exceptthat we want better performance. The function was already optimized to the best of my abilities and changing the
structureof the database would not help. However, having a cache would be the perfect solution. I could implement this
cacheoutside of postgresql if need be but nothing could beat implementing this directly in a postgresql C function. 
>
> So this is what I want, a custom cache built into a postgresql C function. Since postgresql uses different processes,
itwould be best to use the shared memory. Can this be done safely? At its core, the cache could be considered as simple
asa map protected by a mutex. With postgresql, I first need to initialized some shared memory. This is explained at the
endof this link: 
>
> http://www.postgresql.org/docs/8.2/static/xfunc-c.html
>
> However, it sounds like I need to reserve the shared memory in advance using:
>
> void RequestAddinShmemSpace(int size)
>
> In my case, I do not know how big my cache will be. I would preferably allocate the memory dynamically. Is this
possible?In any case, am I trying to reinvent the wheel here? Is there already a shared map or a shared hash structure
availablein postgresql? 
>
> If shared memory turns out too difficult to use, I could create separate caches for each postgresql processes. This
wouldbe a waste of space but it might be better then nothing. In this case, do I need to make my code thread safe? In
otherwords, is postgresql using more then one thread per processes? 
>
> Any insights would be more then welcome!
> Thank you,
> Gabi Julien
>

Re: Custom cache implemented in a postgresql C function

От
Tom Lane
Дата:
Gabi Julien <gabi.julien@broadsign.com> writes:
> In my case, I do not know how big my cache will be.

That makes it awfully hard to use shared memory.

> If shared memory turns out too difficult to use, I could create
> separate caches for each postgresql processes.

That's what I'd recommend.  A big advantage of private caches is that
you don't have any need to manage concurrent access, which simplifies
the code and avoids contention.  All the caches that the core Postgres
code maintains are per-backend.

> This would be a waste
> of space but it might be better then nothing. In this case, do I need
> to make my code thread safe? In other words, is postgresql using more
> then one thread per processes?

No.

            regards, tom lane

Re: Custom cache implemented in a postgresql C function

От
Alban Hertroys
Дата:
On 21 Oct 2010, at 1:44, Gabi Julien wrote:

> Hi,
>
> Here is my problem: I have a postgresql C function that looks like this:
>
> Datum filter(PG_FUNCTION_ARGS);
>
> It takes identifiers and queries a bunch of tables and ends up returning true or false. So far nothing difficult
exceptthat we want better performance. The function was already optimized to the best of my abilities and changing the
structureof the database would not help. However, having a  

That sounds like your function would classify as a STABLE function within Postgres, did you define it as such? Postgres
willcache the results of STABLE (and IMMUTABLE) functions all by itself, in which case you may not need your custom
cache.The default is to classify a function as VOLATILE, meaning the results aren't suitable for caching. 

Another possible solution is to store the results of your function (or of the queries it performs) in a separate
table[1]that would function as a cache of sorts. The benefit is that the table gets managed by Postgres, so you won't
haveto worry about stuff like spilling to disk if the cache grows too large to fit in (available) memory. 

[1] A TEMP TABLE wouldn't work, as it isn't visible to other sessions, although you could create one per session of
course.

Of course, with a custom cache you have more control over how it behaves, so that may still be your best solution.

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.


!DSPAM:737,4cc01f6410281645420170!



Re: Custom cache implemented in a postgresql C function

От
Tom Lane
Дата:
Alban Hertroys <dalroi@solfertje.student.utwente.nl> writes:
> That sounds like your function would classify as a STABLE function
> within Postgres, did you define it as such? Postgres will cache the
> results of STABLE (and IMMUTABLE) functions all by itself, in which
> case you may not need your custom cache.

Uh, no it won't.  It will pre-evaluate immutable functions that are
called with constant arguments, which is not the same thing at all.

            regards, tom lane

Re: Custom cache implemented in a postgresql C function

От
"A.M."
Дата:
On Oct 20, 2010, at 7:44 PM, Gabi Julien wrote:

> Hi,
>
> Here is my problem: I have a postgresql C function that looks like this:
>
> Datum filter(PG_FUNCTION_ARGS);
>
> It takes identifiers and queries a bunch of tables and ends up returning true or false. So far nothing difficult
exceptthat we want better performance. The function was already optimized to the best of my abilities and changing the
structureof the database would not help. However, having a cache would be the perfect solution. I could implement this
cacheoutside of postgresql if need be but nothing could beat implementing this directly in a postgresql C function. 
>
> So this is what I want, a custom cache built into a postgresql C function. Since postgresql uses different processes,
itwould be best to use the shared memory. Can this be done safely? At its core, the cache could be considered as simple
asa map protected by a mutex. With postgresql, I first need to initialized some shared memory. This is explained at the
endof this link: 
>
> http://www.postgresql.org/docs/8.2/static/xfunc-c.html
>
> However, it sounds like I need to reserve the shared memory in advance using:
>
> void RequestAddinShmemSpace(int size)
>
> In my case, I do not know how big my cache will be. I would preferably allocate the memory dynamically. Is this
possible?In any case, am I trying to reinvent the wheel here? Is there already a shared map or a shared hash structure
availablein postgresql? 
>
> If shared memory turns out too difficult to use, I could create separate caches for each postgresql processes. This
wouldbe a waste of space but it might be better then nothing. In this case, do I need to make my code thread safe? In
otherwords, is postgresql using more then one thread per processes? 

Apart from the other suggestions made, another option could be to use your own shared memory which you allocate and
manageyourself (without postgresql managing it). You could implement a simple least-recently-used cache to purge old
entriesas the cache grows. 

Cheers,
M

Re: Custom cache implemented in a postgresql C function

От
Gabi Julien
Дата:
Thanks to all of you. This was very good feedback. I'll use the one cache per process suggestion of Tom Lane. This will
bethe easiest to implement. 

On Thursday 21 October 2010 11:14:40 A.M. wrote:
>
> On Oct 20, 2010, at 7:44 PM, Gabi Julien wrote:
>
> > Hi,
> >
> > Here is my problem: I have a postgresql C function that looks like this:
> >
> > Datum filter(PG_FUNCTION_ARGS);
> >
> > It takes identifiers and queries a bunch of tables and ends up returning true or false. So far nothing difficult
exceptthat we want better performance. The function was already optimized to the best of my abilities and changing the
structureof the database would not help. However, having a cache would be the perfect solution. I could implement this
cacheoutside of postgresql if need be but nothing could beat implementing this directly in a postgresql C function. 
> >
> > So this is what I want, a custom cache built into a postgresql C function. Since postgresql uses different
processes,it would be best to use the shared memory. Can this be done safely? At its core, the cache could be
consideredas simple as a map protected by a mutex. With postgresql, I first need to initialized some shared memory.
Thisis explained at the end of this link: 
> >
> > http://www.postgresql.org/docs/8.2/static/xfunc-c.html
> >
> > However, it sounds like I need to reserve the shared memory in advance using:
> >
> > void RequestAddinShmemSpace(int size)
> >
> > In my case, I do not know how big my cache will be. I would preferably allocate the memory dynamically. Is this
possible?In any case, am I trying to reinvent the wheel here? Is there already a shared map or a shared hash structure
availablein postgresql? 
> >
> > If shared memory turns out too difficult to use, I could create separate caches for each postgresql processes. This
wouldbe a waste of space but it might be better then nothing. In this case, do I need to make my code thread safe? In
otherwords, is postgresql using more then one thread per processes? 
>
> Apart from the other suggestions made, another option could be to use your own shared memory which you allocate and
manageyourself (without postgresql managing it). You could implement a simple least-recently-used cache to purge old
entriesas the cache grows. 
>
> Cheers,
> M