Обсуждение: Traffic jams in fn_extra
As an extension with a lot of CPU load, we (postgis) tend to use flinfo->fn_extra a lot, for caching things that are intensiveto calculate at the start of a query and reuse throughout subsequent functions calls. - coordinate projection objects - indexes of the edges of large geometries - other kinds of indexes of the edges of large geometries :) - schemas of lidar pointcloud collections As we've added different kinds of caching, in our own project, we've banged up against problems of multiple functions tryingto stuff information into the same pointer, and ended up putting an extra container of our own into fn_extra, to holdthe different kinds of stuff we might want to store, a GenericCacheCollection https://github.com/postgis/postgis/blob/svn-trunk/libpgcommon/lwgeom_cache.c#L46-L48 As (by now) a connoisseur of fn_extra caching, I've noticed while reading bits of PgSQL code that there are far more placesthat stuff state into fn_extra than I ever knew, and that they do so without any substantial concern that other statemight already be there. (Well, that's not true, they usually check for NULL and they give up if fn_extra is alreadyin use.) The range types, I was surprised to find doing some performance caching in fn_extra. The set-returning functionmacros use it to hold state. And many others I'm sure. Would it be good/wise to add another, better managed, slot to flinfo, one that isn't just void* but is a hash? (Or, has somemanagement macros to handle it and make it a hash* if it's null, whatever API makes sense) so that multiple bits of codecan cache state over function calls without banging into one another? flinfo->fn_extra_hash perhaps? If this sounds OK, I'd be honored to try and make it my first submission to PgSQL. P. -- Paul Ramsey http://cleverelephant.ca http://postgis.net
Paul Ramsey <pramsey@cleverelephant.ca> writes: > As we've added different kinds of caching, in our own project, we've banged up against problems of multiple functions tryingto stuff information into the same pointer, and ended up putting an extra container of our own into fn_extra, to holdthe different kinds of stuff we might want to store, a GenericCacheCollection TBH, I fail to understand what you're on about here. Any one function owns the value of fn_extra in calls to it, and is solely responsible for its usage; furthermore, there's no way for any other code to mangle that pointer unless the owner explicitly makes it available. So where is the problem? And if there is a problem, how does adding another field of exactly the same kind make it better? regards, tom lane
On Tue, Nov 19, 2013 at 7:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Paul Ramsey <pramsey@cleverelephant.ca> writes: >> As we've added different kinds of caching, in our own project, we've banged up against problems of multiple functionstrying to stuff information into the same pointer, and ended up putting an extra container of our own into fn_extra,to hold the different kinds of stuff we might want to store, a GenericCacheCollection > > TBH, I fail to understand what you're on about here. Any one function > owns the value of fn_extra in calls to it, and is solely responsible for > its usage; furthermore, there's no way for any other code to mangle that > pointer unless the owner explicitly makes it available. So where is > the problem? And if there is a problem, how does adding another field > of exactly the same kind make it better? Right, sorry, I'm reasoning overly aggressively from the specific to the general. The specific problems have been - two lines of geometry caching code, either of which can be called within a single function, depending on the inputs, which mostly didn't result in conflicts, except when they did, which was eventually rectified by layering a GenericCacheCollection into the function - a cached lidar schema object which would have been very useful to have around in an SRF, but couldn't because the SRF needed the fn_extra slot The first case is an application-specific problem, and since we've coded around it, the only advantage to a pgsql-specific fix would be to allow others who also need to cache several independent things to not reinvent that wheel. The second case is one of the instances where the pgsql code itself is getting in the way and cannot be worked around at the application level. My solution was just not to do caching for that function. So, that's what I perceive the problem to be. Now that you point it out to me, yes, it's pretty small bore stuff. On the solution, I wasn't suggesting another void* slot, but rather a slot that holds a hash table, so that an arbitrary number of things can be stuffed in. Overkill, really, since in 99.9% of times only one thing would be in there, and in the other 0.1% of times two things. In our own GenericCacheCollection, we just statically allocate 16 slots. P.
On 19 November 2013 23:08, Paul Ramsey <pramsey@cleverelephant.ca> wrote: > On the solution, I wasn't suggesting another void* slot, but rather a > slot that holds a hash table, so that an arbitrary number of things > can be stuffed in. Overkill, really, since in 99.9% of times only one > thing would be in there, and in the other 0.1% of times two things. In > our own GenericCacheCollection, we just statically allocate 16 slots. Why do you need to do this dance with fn_extra? It's possible to allocate a hash table in a Transaction-lifetime memory context on first call into a function then cache things there. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Hi Simon, We do the dance because it’s how we always have and don’t know any other way, any better way. :) The usual explanation. Isthere any place you can point to that demonstrates your technique? Thanks! P -- Paul Ramsey http://cleverelephant.ca/ http://postgis.net/ On Sunday, November 24, 2013 at 8:21 AM, Simon Riggs wrote: > On 19 November 2013 23:08, Paul Ramsey <pramsey@cleverelephant.ca (mailto:pramsey@cleverelephant.ca)> wrote: > > > On the solution, I wasn't suggesting another void* slot, but rather a > > slot that holds a hash table, so that an arbitrary number of things > > can be stuffed in. Overkill, really, since in 99.9% of times only one > > thing would be in there, and in the other 0.1% of times two things. In > > our own GenericCacheCollection, we just statically allocate 16 slots. > > > > Why do you need to do this dance with fn_extra? > > It's possible to allocate a hash table in a Transaction-lifetime > memory context on first call into a function then cache things there. > > -- > Simon Riggs http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services
On 24 November 2013 16:02, Paul Ramsey <pramsey@cleverelephant.ca> wrote: > We do the dance because it’s how we always have and don’t know any other way, any better way. :) The usual explanation.Is there any place you can point to that demonstrates your technique? src/backend/utils/mmgr/README You can create memory contexts as children of other contexts, so for example you might create "PostGIS Cache Context" as a sub-context of TopTransactionContext. So it can be created dynamically as needed and will automatically go away at end of xact. Or you could use CurTransactionContext if you want to do things at subtransaction level. This is all used very heavily within Postgres itself, including the various caches in different parts of the code. Obviously, if you start cacheing too much then people will claim that PostGIS is leaking memory, so it depends how far you go. But then you might alleviate that with a postgis.session_cache parameter to acknowledge and allow control. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> writes: > On 24 November 2013 16:02, Paul Ramsey <pramsey@cleverelephant.ca> wrote: >> We do the dance because it�s how we always have and don�t know any other way, any better way. :) The usual explanation.Is there any place you can point to that demonstrates your technique? > src/backend/utils/mmgr/README > You can create memory contexts as children of other contexts, so for > example you might create "PostGIS Cache Context" as a sub-context of > TopTransactionContext. So it can be created dynamically as needed and > will automatically go away at end of xact. The real question of course is whether transaction-level caching is appropriate for what they're storing. If they want only statement-level caching then using fn_extra is often the right thing. Also note that having the cache go away is the easy part. The hard part is knowing whether you've created it yet in the current transaction, and finding it if you have. The usual method is to keep a static variable pointing to it, and plugging into the transaction cleanup callback mechanism with a routine that'll reset the pointer to NULL at transaction end. For examples, look for callers of RegisterXactCallback(). regards, tom lane
On Sunday, November 24, 2013 at 4:42 PM, Tom Lane wrote: > The real question of course is whether transaction-level caching is > appropriate for what they're storing. If they want only statement-level > caching then using fn_extra is often the right thing. I'm not certain it is... we get some great effects out of the statement level stuff, and it really works great except forthose cases where somebody else is already taking the slot (SRF_*) > Also note that having the cache go away is the easy part. The hard part > is knowing whether you've created it yet in the current transaction, and > finding it if you have. The usual method is to keep a static variable > pointing to it, and plugging into the transaction cleanup callback > mechanism with a routine that'll reset the pointer to NULL at transaction > end. For examples, look for callers of RegisterXactCallback(). > I'm glad you said that, because I felt too stoopid to ask :) my previous spelunking through memory contexts showed that it'seasy to get the memory, hard to find it again. Which is why fn_extra is so appealing, it's just sitting there, and theparent context goes away at the end of the query, so it's wonderfully simple to use... if there's not already someonein it. Thanks for the advice, it could be very helpful for my pointcloud work, since having access to a cache object during SRF_*stuff could improve performance quite a bit, so the complexity trade-off of using the transactional context could bewell worth it. P.
<div dir="ltr"><div class="gmail_extra"><br /><div class="gmail_quote">On Sun, Nov 24, 2013 at 4:21 PM, Simon Riggs <spandir="ltr"><<a href="mailto:simon@2ndquadrant.com" target="_blank">simon@2ndquadrant.com</a>></span> wrote:<br/><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div id=":3tm"style="overflow:hidden">Why do you need to do this dance with fn_extra?<br /><br /> It's possible to allocate ahash table in a Transaction-lifetime<br /> memory context on first call into a function then cache things there.</div></blockquote></div><br/></div><div class="gmail_extra">fn_extra gives a handle per function call site.<br /><br/>It sounds to me like the complexity is coming not from having many Postgres functions but from having lots of infrastructurebacking up those functions. So if many of their Postgres functions call a C function to do various things andthe C function wants to cache something somewhere related to the object they've been passed then the natural thing todo is have the Postgres function pass fn_extra down to the C function but if they have many such C functions it gets abit tricky.<br /><br /><br /></div><div class="gmail_extra">But you could declare fn_extra to be a hash table yourself sinceit's your Postgres function that gets to decide how fn_extra is going to be used. You could then pass that hash tabledown to the various C functions to cache state. However that might still be a bit odd. If you call the same C functiontwice from the same Postgres function it'll get the same hash table for both calls. fn_extra is per Postgres functioncall site.<br /><br clear="all" /><br />-- <br />greg<br /></div></div>