Обсуждение: Extended customizing, SQL functions, internal variables, API

Поиск
Список
Период
Сортировка

Extended customizing, SQL functions, internal variables, API

От
pgsql@mohawksoft.com
Дата:
I'm not really sure what to call this feature, but I have been talking to
a potential customer and they need a particular feature and they need to
to be very FAST.

Take this query:

select sum(num) from table;

Now, if that table is very large, this can take a lot of time.

Using a trigger, one can update a summary table for changes, i.e. delete,
update, insert. Problem with this is each transaction to the summary table
creates a new row. If you are doing a 100 updates, inserts, deletes a
second, this soon starts to take a long time to access.

Frequent vacuuming now has to happen.

It occurs to me that there is a need for internal state variables that can
be accessed either by functions or something similar.

At PostgreSQL start time, some subsystem allocates and initializes
internal variables. Child processes of PostgreSQL, have this sort of
metaphore:

>>>>>>>>>>> VAR.c >>>>>>>>>>>>>

/* called at init time, create var if nessisary */
PGVAR *var = var_allocate('varname', size);


if( var_lock(var) )
{     /* do something worth while */
    var_unlock(var);
}

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

>>>>>>>>>>> VAR.sql >>>>>>>>>
SELECT var_varname;


Does anyone see a need for this?


Re: Extended customizing, SQL functions, internal variables, API

От
Bruno Wolff III
Дата:
On Fri, May 28, 2004 at 12:46:29 -0400, pgsql@mohawksoft.com wrote:
> It occurs to me that there is a need for internal state variables that can
> be accessed either by functions or something similar.

But there still needs to be multiple copies to take into account that
different transactions may need to see different values of the same
variable.


Re: Extended customizing, SQL functions,

От
pgsql@mohawksoft.com
Дата:
> On Fri, May 28, 2004 at 12:46:29 -0400,
>   pgsql@mohawksoft.com wrote:
>> It occurs to me that there is a need for internal state variables that
>> can
>> be accessed either by functions or something similar.
>
> But there still needs to be multiple copies to take into account that
> different transactions may need to see different values of the same
> variable.
>

Yea, what I'm about to say will cause a lot of people to disagree with me,
and I don't even like the idea for some very small set of examples,
but....

No transactions.

I know this is a very bad thing, and I hate even thinking about it, but
there is a real "need" for this sort of function in some very limited
cases. Let me exaplin, and this really isn't a SQL issue, so much as
flexability to break some rules issue.

My client is sold on PostgreSQL, it works for them perfectly with one
exception. (I have to be careful about NDA stuff here)

The have a database of information that is coming in at a high speed
regular basis. One bit of information is a value. To get this value they
must perform SELECT sum(field) FROM table. Well, this simply does not
scale. They've used a trigger system with a small summary table where they
update, the number in the sumary field. That works fine, except, that
after a few thousand updates, the SELECT time takes a while. Then they
have to vacuum constanty. It just seems like an ugly and wastefull
process.

There is a quick solution, create an internal variable in shared memory
that can be seen by all back-end processes. It is protected by a mutex.

Now, I could roll my own system pretty easily, and probably will do so. It
won't take too much, however, it would be neat if this was in PostgreSQL.

I fully expect that people would worry about this, and I don't blame them.
It is a *bad* idea. Like I said, I could roll my own, but I'm curious if
anyone else sees any benefit to this feature. If it is a feature that
people want, it would best be done from within PostgreSQL. If it is not
something generally wanted, then I'll keep it here or try to get it on
gborg or pgfoundary.


Re: Extended customizing, SQL functions,

От
Jonathan Gardner
Дата:
On Friday 28 May 2004 04:08 pm, pgsql@mohawksoft.com wrote:
>
> No transactions.
>

Then it won't work. Let's say we take your example. The variable "s" is 
supposed to be the sum of a column in a table.

When someone modifies the data in the table, when is "s" adjusted? What if 
it is modified but the transaction is aborted? Who sees the "correct" value 
of "s" at any point in time?

-- 
Jonathan Gardner
jgardner@jonathangardner.net


Re: Extended customizing, SQL functions,

От
pgsql@mohawksoft.com
Дата:
> On Friday 28 May 2004 04:08 pm, pgsql@mohawksoft.com wrote:
>>
>> No transactions.
>>
>
> Then it won't work. Let's say we take your example. The variable "s" is
> supposed to be the sum of a column in a table.
>
> When someone modifies the data in the table, when is "s" adjusted? What if
> it is modified but the transaction is aborted? Who sees the "correct"
> value
> of "s" at any point in time?

Like I said, it is a *bad* idea. No one is denying this, it "breaks" all
the rules, but "with care" I can see a limited, but valuable function.

>
> --
> Jonathan Gardner
> jgardner@jonathangardner.net
>



Re: Extended customizing, SQL functions,

От
Greg Stark
Дата:
pgsql@mohawksoft.com writes:

> > On Fri, May 28, 2004 at 12:46:29 -0400,
> >   pgsql@mohawksoft.com wrote:
> >
> >> It occurs to me that there is a need for internal state variables that
> >> can be accessed either by functions or something similar.

What you're describing is called "materialized views" and indeed a full
featured implementation capable of caching things like sum() and count() would
be a very nice thing to have. But it's also a lot of work.

A partial solution handling just your setup could be done using triggers like
you describe, but it has the major disadvantage of serializing all
insert/delete/updates on the source table. They all become serialized around
the record with the cache of the sum/count.

> > But there still needs to be multiple copies to take into account that
> > different transactions may need to see different values of the same
> > variable.
> 
> Yea, what I'm about to say will cause a lot of people to disagree with me,
> and I don't even like the idea for some very small set of examples,
> but....
> 
> No transactions.

Well sure you can do that, in which case there are other systems that are more
appropriate than postgres. I would suggest you look at memcached or perhaps
mysql.

> Then they have to vacuum constantly. It just seems like an ugly and wasteful
> process.

If you were using other databases the equivalent work would have to happen in
the middle of the critical path of the transaction, which is even uglier and
more wasteful. If you're using 7.4 the new pg_autovacuum daemon will handle
this for you, you can pretend it isn't happening.

The only alternative is giving up transactions, like you say, in which case
you may as well use a tool that doesn't spend so much effort providing them.

-- 
greg



Re: Extended customizing, SQL functions,

От
Shridhar Daithankar
Дата:
On Saturday 29 May 2004 04:38, pgsql@mohawksoft.com wrote:
> Now, I could roll my own system pretty easily, and probably will do so. It
> won't take too much, however, it would be neat if this was in PostgreSQL.
>
> I fully expect that people would worry about this, and I don't blame them.
> It is a *bad* idea. Like I said, I could roll my own, but I'm curious if
> anyone else sees any benefit to this feature. If it is a feature that
> people want, it would best be done from within PostgreSQL. If it is not
> something generally wanted, then I'll keep it here or try to get it on
> gborg or pgfoundary.

I agree that it could be a nice feature. But it reminds me a quote from a C++ 
FAQ I read once.

----------
*. Should I use exception for error handling?

Ans. The real question is can I afford stack unwinding here...
----------

The situation is similar here. When you want something in database, one 
question is to ask is do I need MVCC here?

Of course depending upon the application context the answer well could be yes. 
But at a lot of places, this could be easily be managed in application and 
probably better be done so.

Personally I do not think managing such information in application is an  
hack. 

Just a thought...
Shridhar


Re: Extended customizing, SQL functions,

От
pgsql@mohawksoft.com
Дата:
> On Saturday 29 May 2004 04:38, pgsql@mohawksoft.com wrote:
>> Now, I could roll my own system pretty easily, and probably will do so.
>> It
>> won't take too much, however, it would be neat if this was in
>> PostgreSQL.
>>
>> I fully expect that people would worry about this, and I don't blame
>> them.
>> It is a *bad* idea. Like I said, I could roll my own, but I'm curious if
>> anyone else sees any benefit to this feature. If it is a feature that
>> people want, it would best be done from within PostgreSQL. If it is not
>> something generally wanted, then I'll keep it here or try to get it on
>> gborg or pgfoundary.
>
> I agree that it could be a nice feature. But it reminds me a quote from a
> C++
> FAQ I read once.
>
> ----------
> *. Should I use exception for error handling?
>
> Ans. The real question is can I afford stack unwinding here...
> ----------
>
> The situation is similar here. When you want something in database, one
> question is to ask is do I need MVCC here?

I am a HUGE C/C++ guy. A lot of people dump on C++ because it lets you
shoot yourself in the foot with both barrels without even knowing how ...
if you are not careful. Oddly enough, this very flexability is what makes
it a very powerful and useful language.

Similarly, sometimes, it is very difficult to make PostgreSQL do some of
the things that you need too.


>
> Of course depending upon the application context the answer well could be
> yes.
> But at a lot of places, this could be easily be managed in application and
> probably better be done so.
>
> Personally I do not think managing such information in application is an
> hack.
>
> Just a thought...


I may have a unique view of PostgreSQL, but I don't think of it as just a
database. I think of it as a great applications platform. Most
applications today are data centric, and postgresql fits this bill
perfectly. As a data layer it is hard to beat. The fact that it is a great
SQL database in its own right, is a huge bonus. That being said, every now
and then it lacks this small little feature that working around takes a
bit of work. Like functions returning multiple results, that was huge.
Functions returning rows, that was huge too.

Having internal PostgreSQL variables that are not present on disk, or
maybe, variables that are mirrored on disk may be good.

The whole reason why I made this post was to see if other people have had
similar issues and looked for a similar solution, and to think about if
there is a solution that fits within PostgreSQL and how it would work.



Re: Extended customizing, SQL functions,

От
Greg Stark
Дата:
pgsql@mohawksoft.com writes:

> Having internal PostgreSQL variables that are not present on disk, or
> maybe, variables that are mirrored on disk may be good.

I don't think there's anything wrong with your idea, and there are numerous
good solutions that implement it already. But what makes you think this
belongs in Postgres?

There are plenty of memory and disk based shared databases that are
non-transactional and non-relational and meant for storing just this kind of
non-relational data. Some are much faster than postgres for simple
non-concurrent one-record lookups and updates like this.

Use the right tool for the job. Don't try to make one tool do everything,
especially something that's anathema to its basic design.

-- 
greg



Re: Extended customizing, SQL functions,

От
pgsql@mohawksoft.com
Дата:
> On Saturday 29 May 2004 18:10, pgsql@mohawksoft.com wrote:
>> Having internal PostgreSQL variables that are not present on disk, or
>> maybe, variables that are mirrored on disk may be good.
>
> Yes. I agree. I can see why you proposed no transactions few posts ago.
> Take
> an example of a count variable. It may not have transactions but it is
> expected not to be very accurate anyways.
>
> If I can declare variables which can be changed/read in locked fashion and
> visible to all the backends would be a real great use. It shouldn't have
> transactions because it is not data but a state. It is in database so that
> other connections and stored procedures could see it.
>
> Coupled with the fact that postgresql has custom data types, there is no
> end
> how this could be put to use. Lot more things that sit in application
> layer
> will be inside postgresql, I can image.
>
>> The whole reason why I made this post was to see if other people have
>> had
>> similar issues and looked for a similar solution, and to think about if
>> there is a solution that fits within PostgreSQL and how it would work.
>
> AFAIK, there is no way of doing it in postgresql. But I would love to see
> it
> happen. (I wish I could work on it...:-( )
>

I was thinking that it could be done as a contrib/pgfoundary function. I
think, but am not sure, that a function can be pre-loaded into the main
postgresql backend (postmaster) prior to starting other connections. I'll
have to check that out, but I think it is true, if it isn't it should be
possible to modify postmaster to do so.

The function module, when loaded, looks for its shared memory block, if
none is found, then it procededs to create the block, mutex, etc.

Then, it is a fairly simple task of managing a name/value table protected
by mutex.

Make sense?




Re: Extended customizing, SQL functions,

От
Shridhar Daithankar
Дата:
On Saturday 29 May 2004 20:48, pgsql@mohawksoft.com wrote:
> I was thinking that it could be done as a contrib/pgfoundary function. I
> think, but am not sure, that a function can be pre-loaded into the main
> postgresql backend (postmaster) prior to starting other connections. I'll
> have to check that out, but I think it is true, if it isn't it should be
> possible to modify postmaster to do so.
>
> The function module, when loaded, looks for its shared memory block, if
> none is found, then it procededs to create the block, mutex, etc.
>
> Then, it is a fairly simple task of managing a name/value table protected
> by mutex.
>
> Make sense?

This is effectively a stored procedure isn't it? I have few points on this 
approach

* Every backend has to load it or somebody has to call it as a coding 
standard. Even if postmaster preloads it I doubt it will invoke it but 
correct me if I am wrong.

* This approach puts the code that you would have put in application 
otherwise. I mean it makes sense but just restating for my clarification

* How do you expose pg datatypes to such a system? Or keep it limited to basic 
variables?

* How do you access it from other stored procedures and SQL commands? If you 
do it in C, may be C routines can use it but what about other procedural 
language? I think you would need SPI co-operation.

All in all looks good starting point to me. But it won't be postgresql enough 
but a custom C function. ( That is useful as well though)
Shridhar


Re: Extended customizing, SQL functions,

От
pgsql@mohawksoft.com
Дата:
>
> pgsql@mohawksoft.com writes:
>
>> Having internal PostgreSQL variables that are not present on disk, or
>> maybe, variables that are mirrored on disk may be good.
>
> I don't think there's anything wrong with your idea, and there are
> numerous
> good solutions that implement it already. But what makes you think this
> belongs in Postgres?
>
> There are plenty of memory and disk based shared databases that are
> non-transactional and non-relational and meant for storing just this kind
> of
> non-relational data. Some are much faster than postgres for simple
> non-concurrent one-record lookups and updates like this.
>
> Use the right tool for the job. Don't try to make one tool do everything,
> especially something that's anathema to its basic design.

I agree completely with one caveat, when the best tool for the job lacks a
feature what do you do?

>
> --
> greg
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>



Re: Extended customizing, SQL functions,

От
Greg Stark
Дата:
pgsql@mohawksoft.com writes:

> I agree completely with one caveat, when the best tool for the job lacks a
> feature what do you do?

You're missing the point. The feature you want has nothing to do with
relational databases. It has everything to do with in-memory non-transactional
non-relational databases. These things exist but they're not postgres.

Postgres just isn't the best tool for what you want to do.

Try memcached or any of the other very fast non-persistent non-transactional
in-memory databases. If you try to use postgres to do this you'll find -- as
you just did -- that you've bought a lot of overhead for things you don't
want. Because it's not the appropriate tool.

-- 
greg



Re: Extended customizing, SQL functions,

От
pgsql@mohawksoft.com
Дата:
>
> pgsql@mohawksoft.com writes:
>
>> I agree completely with one caveat, when the best tool for the job lacks
>> a
>> feature what do you do?
>
> You're missing the point. The feature you want has nothing to do with
> relational databases. It has everything to do with in-memory
> non-transactional
> non-relational databases. These things exist but they're not postgres.
>
> Postgres just isn't the best tool for what you want to do.
>
> Try memcached or any of the other very fast non-persistent
> non-transactional
> in-memory databases. If you try to use postgres to do this you'll find --
> as
> you just did -- that you've bought a lot of overhead for things you don't
> want. Because it's not the appropriate tool.

That's the problem. It easy to say, in effect, this isn't the job of the
database. Yet, the information is based on what's in the database. It is
one of those ambiguous things that life is so anoyingly full of.  It's
really data related, so it should be in the database, it's really the
application's place to do this, so it should be in the application.

When all is said and done, I would say it is "too" data related to be so
separated from the database.  Remember, PostgreSQL was chosen for the vast
number of advantages, this is just one small issues.


Re: Extended customizing, SQL functions,

От
Shridhar Daithankar
Дата:
On Saturday 29 May 2004 18:10, pgsql@mohawksoft.com wrote:
> Having internal PostgreSQL variables that are not present on disk, or
> maybe, variables that are mirrored on disk may be good.

Yes. I agree. I can see why you proposed no transactions few posts ago. Take 
an example of a count variable. It may not have transactions but it is 
expected not to be very accurate anyways.

If I can declare variables which can be changed/read in locked fashion and 
visible to all the backends would be a real great use. It shouldn't have 
transactions because it is not data but a state. It is in database so that 
other connections and stored procedures could see it.

Coupled with the fact that postgresql has custom data types, there is no end 
how this could be put to use. Lot more things that sit in application layer 
will be inside postgresql, I can image.

> The whole reason why I made this post was to see if other people have had
> similar issues and looked for a similar solution, and to think about if
> there is a solution that fits within PostgreSQL and how it would work.

AFAIK, there is no way of doing it in postgresql. But I would love to see it 
happen. (I wish I could work on it...:-( )
Shridhar


Re: Extended customizing, SQL functions,

От
Sailesh Krishnamurthy
Дата:
>>>>> "pgsql" == pgsql  <pgsql@mohawksoft.com> writes:
   pgsql> The have a database of information that is coming in at a   pgsql> high speed regular basis. One bit of
informationis a   pgsql> value. To get this value they must perform SELECT   pgsql> sum(field) FROM table. Well, this
simplydoes not   pgsql> scale. They've used a trigger system with a small summary   pgsql> table where they update, the
numberin the sumary   pgsql> field. That works fine, except, that after a few thousand   pgsql> updates, the SELECT
timetakes a while. Then they have to   pgsql> vacuum constanty. It just seems like an ugly and wastefull   pgsql>
process.

Sounds like something that TelegraphCQ can do well ..

http://telegraph.cs.berkeley.edu 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh