Обсуждение: Extended customizing, SQL functions, internal variables, API
I'm not really sure what to call this feature, but I have been talking to a potential customer and they need a particular feature and they need to to be very FAST. Take this query: select sum(num) from table; Now, if that table is very large, this can take a lot of time. Using a trigger, one can update a summary table for changes, i.e. delete, update, insert. Problem with this is each transaction to the summary table creates a new row. If you are doing a 100 updates, inserts, deletes a second, this soon starts to take a long time to access. Frequent vacuuming now has to happen. It occurs to me that there is a need for internal state variables that can be accessed either by functions or something similar. At PostgreSQL start time, some subsystem allocates and initializes internal variables. Child processes of PostgreSQL, have this sort of metaphore: >>>>>>>>>>> VAR.c >>>>>>>>>>>>> /* called at init time, create var if nessisary */ PGVAR *var = var_allocate('varname', size); if( var_lock(var) ) { /* do something worth while */ var_unlock(var); } <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< >>>>>>>>>>> VAR.sql >>>>>>>>> SELECT var_varname; Does anyone see a need for this?
On Fri, May 28, 2004 at 12:46:29 -0400, pgsql@mohawksoft.com wrote: > It occurs to me that there is a need for internal state variables that can > be accessed either by functions or something similar. But there still needs to be multiple copies to take into account that different transactions may need to see different values of the same variable.
> On Fri, May 28, 2004 at 12:46:29 -0400, > pgsql@mohawksoft.com wrote: >> It occurs to me that there is a need for internal state variables that >> can >> be accessed either by functions or something similar. > > But there still needs to be multiple copies to take into account that > different transactions may need to see different values of the same > variable. > Yea, what I'm about to say will cause a lot of people to disagree with me, and I don't even like the idea for some very small set of examples, but.... No transactions. I know this is a very bad thing, and I hate even thinking about it, but there is a real "need" for this sort of function in some very limited cases. Let me exaplin, and this really isn't a SQL issue, so much as flexability to break some rules issue. My client is sold on PostgreSQL, it works for them perfectly with one exception. (I have to be careful about NDA stuff here) The have a database of information that is coming in at a high speed regular basis. One bit of information is a value. To get this value they must perform SELECT sum(field) FROM table. Well, this simply does not scale. They've used a trigger system with a small summary table where they update, the number in the sumary field. That works fine, except, that after a few thousand updates, the SELECT time takes a while. Then they have to vacuum constanty. It just seems like an ugly and wastefull process. There is a quick solution, create an internal variable in shared memory that can be seen by all back-end processes. It is protected by a mutex. Now, I could roll my own system pretty easily, and probably will do so. It won't take too much, however, it would be neat if this was in PostgreSQL. I fully expect that people would worry about this, and I don't blame them. It is a *bad* idea. Like I said, I could roll my own, but I'm curious if anyone else sees any benefit to this feature. If it is a feature that people want, it would best be done from within PostgreSQL. If it is not something generally wanted, then I'll keep it here or try to get it on gborg or pgfoundary.
On Friday 28 May 2004 04:08 pm, pgsql@mohawksoft.com wrote: > > No transactions. > Then it won't work. Let's say we take your example. The variable "s" is supposed to be the sum of a column in a table. When someone modifies the data in the table, when is "s" adjusted? What if it is modified but the transaction is aborted? Who sees the "correct" value of "s" at any point in time? -- Jonathan Gardner jgardner@jonathangardner.net
> On Friday 28 May 2004 04:08 pm, pgsql@mohawksoft.com wrote: >> >> No transactions. >> > > Then it won't work. Let's say we take your example. The variable "s" is > supposed to be the sum of a column in a table. > > When someone modifies the data in the table, when is "s" adjusted? What if > it is modified but the transaction is aborted? Who sees the "correct" > value > of "s" at any point in time? Like I said, it is a *bad* idea. No one is denying this, it "breaks" all the rules, but "with care" I can see a limited, but valuable function. > > -- > Jonathan Gardner > jgardner@jonathangardner.net >
pgsql@mohawksoft.com writes: > > On Fri, May 28, 2004 at 12:46:29 -0400, > > pgsql@mohawksoft.com wrote: > > > >> It occurs to me that there is a need for internal state variables that > >> can be accessed either by functions or something similar. What you're describing is called "materialized views" and indeed a full featured implementation capable of caching things like sum() and count() would be a very nice thing to have. But it's also a lot of work. A partial solution handling just your setup could be done using triggers like you describe, but it has the major disadvantage of serializing all insert/delete/updates on the source table. They all become serialized around the record with the cache of the sum/count. > > But there still needs to be multiple copies to take into account that > > different transactions may need to see different values of the same > > variable. > > Yea, what I'm about to say will cause a lot of people to disagree with me, > and I don't even like the idea for some very small set of examples, > but.... > > No transactions. Well sure you can do that, in which case there are other systems that are more appropriate than postgres. I would suggest you look at memcached or perhaps mysql. > Then they have to vacuum constantly. It just seems like an ugly and wasteful > process. If you were using other databases the equivalent work would have to happen in the middle of the critical path of the transaction, which is even uglier and more wasteful. If you're using 7.4 the new pg_autovacuum daemon will handle this for you, you can pretend it isn't happening. The only alternative is giving up transactions, like you say, in which case you may as well use a tool that doesn't spend so much effort providing them. -- greg
On Saturday 29 May 2004 04:38, pgsql@mohawksoft.com wrote: > Now, I could roll my own system pretty easily, and probably will do so. It > won't take too much, however, it would be neat if this was in PostgreSQL. > > I fully expect that people would worry about this, and I don't blame them. > It is a *bad* idea. Like I said, I could roll my own, but I'm curious if > anyone else sees any benefit to this feature. If it is a feature that > people want, it would best be done from within PostgreSQL. If it is not > something generally wanted, then I'll keep it here or try to get it on > gborg or pgfoundary. I agree that it could be a nice feature. But it reminds me a quote from a C++ FAQ I read once. ---------- *. Should I use exception for error handling? Ans. The real question is can I afford stack unwinding here... ---------- The situation is similar here. When you want something in database, one question is to ask is do I need MVCC here? Of course depending upon the application context the answer well could be yes. But at a lot of places, this could be easily be managed in application and probably better be done so. Personally I do not think managing such information in application is an hack. Just a thought... Shridhar
> On Saturday 29 May 2004 04:38, pgsql@mohawksoft.com wrote: >> Now, I could roll my own system pretty easily, and probably will do so. >> It >> won't take too much, however, it would be neat if this was in >> PostgreSQL. >> >> I fully expect that people would worry about this, and I don't blame >> them. >> It is a *bad* idea. Like I said, I could roll my own, but I'm curious if >> anyone else sees any benefit to this feature. If it is a feature that >> people want, it would best be done from within PostgreSQL. If it is not >> something generally wanted, then I'll keep it here or try to get it on >> gborg or pgfoundary. > > I agree that it could be a nice feature. But it reminds me a quote from a > C++ > FAQ I read once. > > ---------- > *. Should I use exception for error handling? > > Ans. The real question is can I afford stack unwinding here... > ---------- > > The situation is similar here. When you want something in database, one > question is to ask is do I need MVCC here? I am a HUGE C/C++ guy. A lot of people dump on C++ because it lets you shoot yourself in the foot with both barrels without even knowing how ... if you are not careful. Oddly enough, this very flexability is what makes it a very powerful and useful language. Similarly, sometimes, it is very difficult to make PostgreSQL do some of the things that you need too. > > Of course depending upon the application context the answer well could be > yes. > But at a lot of places, this could be easily be managed in application and > probably better be done so. > > Personally I do not think managing such information in application is an > hack. > > Just a thought... I may have a unique view of PostgreSQL, but I don't think of it as just a database. I think of it as a great applications platform. Most applications today are data centric, and postgresql fits this bill perfectly. As a data layer it is hard to beat. The fact that it is a great SQL database in its own right, is a huge bonus. That being said, every now and then it lacks this small little feature that working around takes a bit of work. Like functions returning multiple results, that was huge. Functions returning rows, that was huge too. Having internal PostgreSQL variables that are not present on disk, or maybe, variables that are mirrored on disk may be good. The whole reason why I made this post was to see if other people have had similar issues and looked for a similar solution, and to think about if there is a solution that fits within PostgreSQL and how it would work.
pgsql@mohawksoft.com writes: > Having internal PostgreSQL variables that are not present on disk, or > maybe, variables that are mirrored on disk may be good. I don't think there's anything wrong with your idea, and there are numerous good solutions that implement it already. But what makes you think this belongs in Postgres? There are plenty of memory and disk based shared databases that are non-transactional and non-relational and meant for storing just this kind of non-relational data. Some are much faster than postgres for simple non-concurrent one-record lookups and updates like this. Use the right tool for the job. Don't try to make one tool do everything, especially something that's anathema to its basic design. -- greg
> On Saturday 29 May 2004 18:10, pgsql@mohawksoft.com wrote: >> Having internal PostgreSQL variables that are not present on disk, or >> maybe, variables that are mirrored on disk may be good. > > Yes. I agree. I can see why you proposed no transactions few posts ago. > Take > an example of a count variable. It may not have transactions but it is > expected not to be very accurate anyways. > > If I can declare variables which can be changed/read in locked fashion and > visible to all the backends would be a real great use. It shouldn't have > transactions because it is not data but a state. It is in database so that > other connections and stored procedures could see it. > > Coupled with the fact that postgresql has custom data types, there is no > end > how this could be put to use. Lot more things that sit in application > layer > will be inside postgresql, I can image. > >> The whole reason why I made this post was to see if other people have >> had >> similar issues and looked for a similar solution, and to think about if >> there is a solution that fits within PostgreSQL and how it would work. > > AFAIK, there is no way of doing it in postgresql. But I would love to see > it > happen. (I wish I could work on it...:-( ) > I was thinking that it could be done as a contrib/pgfoundary function. I think, but am not sure, that a function can be pre-loaded into the main postgresql backend (postmaster) prior to starting other connections. I'll have to check that out, but I think it is true, if it isn't it should be possible to modify postmaster to do so. The function module, when loaded, looks for its shared memory block, if none is found, then it procededs to create the block, mutex, etc. Then, it is a fairly simple task of managing a name/value table protected by mutex. Make sense?
On Saturday 29 May 2004 20:48, pgsql@mohawksoft.com wrote: > I was thinking that it could be done as a contrib/pgfoundary function. I > think, but am not sure, that a function can be pre-loaded into the main > postgresql backend (postmaster) prior to starting other connections. I'll > have to check that out, but I think it is true, if it isn't it should be > possible to modify postmaster to do so. > > The function module, when loaded, looks for its shared memory block, if > none is found, then it procededs to create the block, mutex, etc. > > Then, it is a fairly simple task of managing a name/value table protected > by mutex. > > Make sense? This is effectively a stored procedure isn't it? I have few points on this approach * Every backend has to load it or somebody has to call it as a coding standard. Even if postmaster preloads it I doubt it will invoke it but correct me if I am wrong. * This approach puts the code that you would have put in application otherwise. I mean it makes sense but just restating for my clarification * How do you expose pg datatypes to such a system? Or keep it limited to basic variables? * How do you access it from other stored procedures and SQL commands? If you do it in C, may be C routines can use it but what about other procedural language? I think you would need SPI co-operation. All in all looks good starting point to me. But it won't be postgresql enough but a custom C function. ( That is useful as well though) Shridhar
> > pgsql@mohawksoft.com writes: > >> Having internal PostgreSQL variables that are not present on disk, or >> maybe, variables that are mirrored on disk may be good. > > I don't think there's anything wrong with your idea, and there are > numerous > good solutions that implement it already. But what makes you think this > belongs in Postgres? > > There are plenty of memory and disk based shared databases that are > non-transactional and non-relational and meant for storing just this kind > of > non-relational data. Some are much faster than postgres for simple > non-concurrent one-record lookups and updates like this. > > Use the right tool for the job. Don't try to make one tool do everything, > especially something that's anathema to its basic design. I agree completely with one caveat, when the best tool for the job lacks a feature what do you do? > > -- > greg > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >
pgsql@mohawksoft.com writes: > I agree completely with one caveat, when the best tool for the job lacks a > feature what do you do? You're missing the point. The feature you want has nothing to do with relational databases. It has everything to do with in-memory non-transactional non-relational databases. These things exist but they're not postgres. Postgres just isn't the best tool for what you want to do. Try memcached or any of the other very fast non-persistent non-transactional in-memory databases. If you try to use postgres to do this you'll find -- as you just did -- that you've bought a lot of overhead for things you don't want. Because it's not the appropriate tool. -- greg
> > pgsql@mohawksoft.com writes: > >> I agree completely with one caveat, when the best tool for the job lacks >> a >> feature what do you do? > > You're missing the point. The feature you want has nothing to do with > relational databases. It has everything to do with in-memory > non-transactional > non-relational databases. These things exist but they're not postgres. > > Postgres just isn't the best tool for what you want to do. > > Try memcached or any of the other very fast non-persistent > non-transactional > in-memory databases. If you try to use postgres to do this you'll find -- > as > you just did -- that you've bought a lot of overhead for things you don't > want. Because it's not the appropriate tool. That's the problem. It easy to say, in effect, this isn't the job of the database. Yet, the information is based on what's in the database. It is one of those ambiguous things that life is so anoyingly full of. It's really data related, so it should be in the database, it's really the application's place to do this, so it should be in the application. When all is said and done, I would say it is "too" data related to be so separated from the database. Remember, PostgreSQL was chosen for the vast number of advantages, this is just one small issues.
On Saturday 29 May 2004 18:10, pgsql@mohawksoft.com wrote: > Having internal PostgreSQL variables that are not present on disk, or > maybe, variables that are mirrored on disk may be good. Yes. I agree. I can see why you proposed no transactions few posts ago. Take an example of a count variable. It may not have transactions but it is expected not to be very accurate anyways. If I can declare variables which can be changed/read in locked fashion and visible to all the backends would be a real great use. It shouldn't have transactions because it is not data but a state. It is in database so that other connections and stored procedures could see it. Coupled with the fact that postgresql has custom data types, there is no end how this could be put to use. Lot more things that sit in application layer will be inside postgresql, I can image. > The whole reason why I made this post was to see if other people have had > similar issues and looked for a similar solution, and to think about if > there is a solution that fits within PostgreSQL and how it would work. AFAIK, there is no way of doing it in postgresql. But I would love to see it happen. (I wish I could work on it...:-( ) Shridhar
>>>>> "pgsql" == pgsql <pgsql@mohawksoft.com> writes: pgsql> The have a database of information that is coming in at a pgsql> high speed regular basis. One bit of informationis a pgsql> value. To get this value they must perform SELECT pgsql> sum(field) FROM table. Well, this simplydoes not pgsql> scale. They've used a trigger system with a small summary pgsql> table where they update, the numberin the sumary pgsql> field. That works fine, except, that after a few thousand pgsql> updates, the SELECT timetakes a while. Then they have to pgsql> vacuum constanty. It just seems like an ugly and wastefull pgsql> process. Sounds like something that TelegraphCQ can do well .. http://telegraph.cs.berkeley.edu -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh