Обсуждение: Re: [GENERAL] java stored procedures

Поиск
Список
Период
Сортировка

Re: [GENERAL] java stored procedures

От
Barry Lind
Дата:
Laszlo,

I think it would help a lot if you could take a little time to write
down what your planned architecture for a pljava would be.  It then
becomes much easier for myself and probably others reading these lists
to make suggestions on ways to improve what you are planning (or
possible problems with your strategy).  Without knowing what exactly you
are thinking of doing it is difficult to comment.

But let me try throwing out a few thoughts about how I think this should
be done.

First question is how will the jvm be run?  Since postgres is a
multiprocess implementation (i.e. each connection has a separate process
on the server) and since java is a multithreaded implementation (i.e.
one process supporting multiple threads), what should the pljava
implementation look like?  I think there should be a single jvm process
for the entire db server that each postgresql process connects to
through sockets/rmi.  It will be too expensive to create a new jvm
process for each postgresql connection (expensive in both terms of
memory and cpu, since the startup time for the jvm is significant and it
requires a lot of memory).

Having one jvm that all the postgres backend processes communicate with
makes the whole feature much more complicated, but is necessary in my
opinion.

Then the question becomes how does the jvm process interact with the
database since they are two different processes.  You will need some
sort of interprocess communication between the two to execute sql
statements.  This could be accomplished by using the existing jdbc
driver.  But the bigest problem here is getting the transaction
semantics right.  How does a sql statement being run by a java stored
procedure get access to the same connection/transaction as the original
client?  What you don't want happening is that sql issued in a stored
java procedure executes in a different transaction as the caller, what
would rollback of the stored function call mean in that case?

I am very interested in hearing what your plans are for pl/java.  I
think this is a very difficult project, but one that would be very
useful and welcome.

thanks,
--Barry




Laszlo Hornyak wrote:

> Hi!
>
> I am such a lame in the licensing area. As much as I know, BSD license
> is more free than GPL. I think it is too early to think about licensing,
> but it`s ok, you won :), when it will be ready(or it will seem to get
> closer to a working thing, currently it looks more like a interresting
> test), I will ask you if you want to distribute it with Postgres, and if
> you say yes, the license will be the same as Postgresql`s license.
> Anyway is this neccessary when it is the part of the distribution?
> Is this ok for you?
>
> thanks,
> Laszlo Hornyak
>
> ps: still waiting for your ideas, suggestions, etc :) I am not memeber
> of the mailing list, please write me dirrectly!
>
> Barry Lind wrote:
>
>> Laszlo,
>>
>> In my mind it would be more useful if this code was under the same
>> license as the rest of postgresql.  That way it could become part of
>> the product as opposed to always being a separate component.  (Just
>> like plpgsql, pltcl and the other procedural languages).
>>
>> thanks,
>> --Barry
>>
>>
>
>



Re: [GENERAL] java stored procedures

От
Barry Lind
Дата:
Laszlo,


> I am very far from features like this.
> PL/JAVA now:
> -there is a separate process running java (kaffe). this process creates
> a sys v message queue, that holds requests. almost forgot, a shared
> memory segment too. I didn`t find better way to tell postgres the
> informations about the java process.


Does the mechanism you are planning support running any JVM?  In my
opionion Kaffe isn't good enough to be widely useful.  I think you
should be able to plugin whatever jvm is best on your platform, which
will likely be either the Sun or IBM JVMs.

Also, can you explain this a little bit more.  How does the jvm process
get started? (I would hope that the postgresql server processes would
start it when needed, as opposed to requiring that it be started
separately.)  How does the jvm access these shared memory structures?
Since there aren't any methods in the java API to do such things that I
am aware of.

> -the java request_handler function on the server side attaches to the
> shared memory, reads the key of the message queue., attaches to it,
> sends the data of the function, and a signal for the pl/java. after, it
> is waiting for a signal from the java thread.


I don't understand how you do this in java?  I must not be understanding
  something correctly here.

> -when java thread receives the signal, it reads the message(s) from the
> queue, and starts some actions. When done it tells postgres with a
> signal that it is ready, and it can come for its results. This will be
> rewritten see below problems.


Are signals the best way to accomplish this?

> -And postgres is runing, while java is waiting for postgres to say
> something.


But in reality if the postgres process is executing a stored function it
needs to wait for the result of that function call before continuing
doesn't it?

>
> Threading on the java process side is not done yet, ok, it is not that
> hard, I will write it, if it will be realy neccessary.


Agreed, this is important.

>
> The problems, for now:
> I had a very simple system, that passed a very limited scale of argument
> types, with a very limited quantity of parameters (int, varchar, bool).
> Postgres has limits for the argument count too, but not for types. It
> had too much limits, so I am working (or to tell the truth now only
> thinking) on a new type handling that fits the felxibility of
> Postgresql`s type flexibility. For this I will have to learn a lot about
> Postgres`s type system. This will be my program this weekend. :)


Shouldn't this code use all or most of the logic found in the FE/BE
protocol?  Why invent and code another mechanism to transfer data when
one already exists.  (I will admit that the current FE/BE mechanism
isn't the ideal choice, but it seems easier to reuse what exists for now
and improve on it later).


>
> thanks,
> Laszlo Hornyak
>

You didn't mention how you plan to deal with the transaction symantics.
  So what happens when the pl/java function calls through jdbc back to
the server to insert some data?  That should happen in the same
transaction as the caller correct?

thanks,
--Barry


Re: [GENERAL] java stored procedures

От
Gunnar Rønning
Дата:
* Barry Lind <barry@xythos.com> wrote:
|
| possible problems with your strategy).  Without knowing what exactly
| you are thinking of doing it is difficult to comment.

Agreed.

| Having one jvm that all the postgres backend processes communicate
| with makes the whole feature much more complicated, but is necessary
| in my opinion.

I'm not quite sure if I agree here. Startup time is not an issue if you are
using connection pooling on the client and memory is cheap ;-) Having
a separate process would indeed introduce overhead where you don't want
it. In the critical path when executing queries.

| I am very interested in hearing what your plans are for pl/java.  I
| think this is a very difficult project, but one that would be very
| useful and welcome.

I would very much like to hear about the plans myself.


--
Gunnar Rønning - gunnar@polygnosis.com
Senior Consultant, Polygnosis AS, http://www.polygnosis.com/

Re: [GENERAL] java stored procedures

От
Laszlo Hornyak
Дата:
Hi!

Barry Lind wrote:

> Laszlo,
>
> I think it would help a lot if you could take a little time to write
> down what your planned architecture for a pljava would be.  It then
> becomes much easier for myself and probably others reading these lists
> to make suggestions on ways to improve what you are planning (or
> possible problems with your strategy).  Without knowing what exactly
> you are thinking of doing it is difficult to comment.

>
>
> But let me try throwing out a few thoughts about how I think this
> should be done.
>
> First question is how will the jvm be run?  Since postgres is a
> multiprocess implementation (i.e. each connection has a separate
> process on the server) and since java is a multithreaded
> implementation (i.e. one process supporting multiple threads), what
> should the pljava implementation look like?  I think there should be a
> single jvm process for the entire db server that each postgresql
> process connects to through sockets/rmi.  It will be too expensive to
> create a new jvm process for each postgresql connection (expensive in
> both terms of memory and cpu, since the startup time for the jvm is
> significant and it requires a lot of memory).

I absolutely agree. OK, it`s done.

So, a late-night-brainstorming here:
What I would like to see in PL/JAVA is the object oriented features,
that makes postgresql nice. Creating a new table creates a new class in
the java side too. Instantiating an object of the newly created class
inserts a row into the table. In postgresql tables can be inherited, and
this could be easyly done by pl/java too. I think this would look nice.
But this is not the main feature. Why I would like to see a nice java
procedural language inside postgres is java`s advanced communication
features (I mean CORBA, jdbc, other protocols). This is the sugar in the
caffe.

I am very far from features like this.
PL/JAVA now:
-there is a separate process running java (kaffe). this process creates
a sys v message queue, that holds requests. almost forgot, a shared
memory segment too. I didn`t find better way to tell postgres the
informations about the java process.
-the java request_handler function on the server side attaches to the
shared memory, reads the key of the message queue., attaches to it,
sends the data of the function, and a signal for the pl/java. after, it
is waiting for a signal from the java thread.
-when java thread receives the signal, it reads the message(s) from the
queue, and starts some actions. When done it tells postgres with a
signal that it is ready, and it can come for its results. This will be
rewritten see below problems.
-And postgres is runing, while java is waiting for postgres to say
something.

Threading on the java process side is not done yet, ok, it is not that
hard, I will write it, if it will be realy neccessary.

The problems, for now:
I had a very simple system, that passed a very limited scale of argument
types, with a very limited quantity of parameters (int, varchar, bool).
Postgres has limits for the argument count too, but not for types. It
had too much limits, so I am working (or to tell the truth now only
thinking) on a new type handling that fits the felxibility of
Postgresql`s type flexibility. For this I will have to learn a lot about
Postgres`s type system. This will be my program this weekend. :)

thanks,
Laszlo Hornyak


Re: [GENERAL] java stored procedures

От
Doug McNaught
Дата:
Barry Lind <barry@xythos.com> writes:

> Having one jvm that all the postgres backend processes communicate with makes
> the whole feature much more complicated, but is necessary in my opinion.

Agreed.  Also, the JVM is a multithreaded app, and running it inside a
non-threaded program (the backend) might cause problems.

> Then the question becomes how does the jvm process interact with the database
> since they are two different processes.  You will need some sort of
> interprocess communication between the two to execute sql statements.  This
> could be accomplished by using the existing jdbc driver.  But the bigest
> problem here is getting the transaction semantics right.  How does a sql
> statement being run by a java stored procedure get access to the same
> connection/transaction as the original client?  What you don't want happening
> is that sql issued in a stored java procedure executes in a different
> transaction as the caller, what would rollback of the stored function call
> mean in that case?

I think you would have to to expose the SPI layer to Java running in a
separate process, either using an RMI server written in C or a custom
protocol over a TCP socket (Java of course can't do Unix sockets).
This raises some thorny issues of authentication and security but I
don't think they're insurmountable.  You could, for example, create a
cryptographically strong "cookie" in the backend when a Java function
is called.  The cookie would be passed to the Java function when it
gets invoked, and then must be passed back to the SPI layer in order
for the latter to accept the call.  A bit clunky but should be safe as
far as I can see.

The cookie would be needed anyhow, I think, in order for the SPI layer
to be able to find the transaction that the Java function was
originally invoked in.

You could make the SPI layer stuff look like a normal JDBC driver to
user code--PL/Perl does this kind of thing with the Perl DBI
interface.

-Doug
--
Let us cross over the river, and rest under the shade of the trees.
   --T. J. Jackson, 1863

Re: [GENERAL] java stored procedures

От
Laszlo Hornyak
Дата:
Hi!

Barry Lind wrote:

> Does the mechanism you are planning support running any JVM?  In my
> opionion Kaffe isn't good enough to be widely useful.  I think you
> should be able to plugin whatever jvm is best on your platform, which
> will likely be either the Sun or IBM JVMs.

Ok, I also had problems with caffe, but it may work. I like it becouse
it is small (the source is about 6M). As much as I know Java VM`s has a
somewhat standard native interface called JNI. I use this to start the
VM, and communicate with it. If you think I should change I will do it,
but it may take a long time to get the new VM. For then I have to run kaffe.

> Also, can you explain this a little bit more.  How does the jvm
> process get started? (I would hope that the postgresql server
> processes would start it when needed, as opposed to requiring that it
> be started separately.)  How does the jvm access these shared memory
> structures? Since there aren't any methods in the java API to do such
> things that I am aware of.

JVM does not. 'the java process' does with simple posix calls. I use
debian potatoe, on any other posix system it should work, on any other
somewhat posix compatible system it may work, I am not sure...

>
> I don't understand how you do this in java?  I must not be
> understanding  something correctly here.

My failure.
The 'java request_handler' is not a java function, it is the C
call_handler in the Postgres side, that is started when a function of
language 'pljava' is called.
I made some failure in my previous mail. At home I named the pl/java
language pl/pizza (something that is not caffe, but well known enough
:). The application has two running binaries:
-pizza (which was called 'java process' last time) This is a small C
program that uses JNI to start VM and call java methods.
-plpizza.so the shared object that contains the call_handler function.


>
>
>> -when java thread receives the signal, it reads the message(s) from
>> the queue, and starts some actions. When done it tells postgres with
>> a signal that it is ready, and it can come for its results. This will
>> be rewritten see below problems.
>
>
>
> Are signals the best way to accomplish this?

I don`t know if it is the best, it is the only way I know :)
Do you know any other ways?

>
>
>> -And postgres is runing, while java is waiting for postgres to say
>> something.
>
> But in reality if the postgres process is executing a stored function
> it needs to wait for the result of that function call before
> continuing doesn't it?

Surely, this is done. How could Postgres tell the result anyway ? :)

>
>>
>> Threading on the java process side is not done yet, ok, it is not
>> that hard, I will write it, if it will be realy neccessary.
>
> Agreed, this is important.
>
> Shouldn't this code use all or most of the logic found in the FE/BE
> protocol?  Why invent and code another mechanism to transfer data when
> one already exists.  (I will admit that the current FE/BE mechanism
> isn't the ideal choice, but it seems easier to reuse what exists for
> now and improve on it later).

Well, I am relatively new to Postgres, and I don`t know these protocols.
In the weekend I will start to learn it, and in Sunday or Monday I maybe
I will understand it, if not, next weekend..

>
> You didn't mention how you plan to deal with the transaction
> symantics.  So what happens when the pl/java function calls through
> jdbc back to the server to insert some data?  That should happen in
> the same transaction as the caller correct?

I don`t think this will be a problem, I have ideas for this. Idea mean:
I know how I will start it, it may be good, or it may be fataly stupid
idea, it will turn out when I tried it. Simply: The same way plpizza
tells pizza the request, pizza can talk back to plpizza. This is planed
to work with similar mechanism I described last time (shm+signals).

Monday I will try to send a little pieces of code to make thing clear, ok?

thanks,
Laszlo Hornyak


Re: [GENERAL] java stored procedures

От
Laszlo Hornyak
Дата:
Barry Lind wrote:

>
> I also stopped cc'ing the general list, since this is getting too
> detailed for most of the members on that list.

Ok.

> Now to your mail:
>
> This seems like a reasonable approach and should work across different
> JVMs.  It would probably be a good experiment to try this with the Sun
> or IBM jvm at some point to verify.  What I was afraid of was that you
> were hacking the Kaffe code to perform the integration which would
> limit this solution to only using Kaffe.

I am sure they wont work the same way. I think I have a sun jdk 1.3.0-2,
so I will try to port it soon. The IBM implementation must wait I think
until january.

>
> Just a suggestion:  PL/J might be a good name, since as you probably
> know it can't be called pl/java because of the trademark restrictions
> on the word 'java'.

Ok, you won, I do not read the licenses. From now it`s name is pl/j.
Isn`t 'j' too short for the name of the process that runns java? :)

>
> I am a little concerned about the stability and complexity of having
> this '-pizza' program be responsible for handling the calls on the
> java side.  My concern is that this will need to be a multithreaded
> program since multiple backends will concurrently be needing to
> interact with multiple java threads through this one program.  It
> might be simpler if each postgres process directly communicated to a
> java thread via a tcpip socket.  Then the "-pizza" program would only
> need to be responsible for starting up the jvm and creating java
> threads and sockets for a postgres process (it would perform a similar
> role to postmaster for postgres client connections).

With good design we can solve stability problems. As much as I know, if
postmaster dies, the postgres server becomes unavailable, this looks the
same problem. I do not know if we realy need sockets. Anyway, if 'j'
dies, we can create a new one, and restart calculations. Some watchdog
functionality...
Doing thing with sockets need a lot of rework. It is the best time for
this, while there is not too much thing done.

>>>
>>>> -when java thread receives the signal, it reads the message(s) from
>>>> the queue, and starts some actions. When done it tells postgres
>>>> with a signal that it is ready, and it can come for its results.
>>>> This will be rewritten see below problems.
>>>
>>> Are signals the best way to accomplish this?
>>
>> I don`t know if it is the best, it is the only way I know :)
>> Do you know any other ways?
>>
> I don't know, but hopefully someone on the hackers list will chip in
> here with a comment.

After a first developement cycle (if my brain doesn`t burn down), the
signals can be replaced to a plugable communication interface I think.
So maybe we can use CORBA, or sockets, or something else. This will take
a lot of time.

> OK, so the same backend process that called the function gets messaged
> to process the sql.  This should work.  However it means you will need
> a special version of the jdbc driver that uses this shm+signals
> communication mechanism instead of what the current jdbc driver does.
> This is something I would be happy to help you with.


This is kind of you. :)
For this, I will have to finish the protocol of communication. I have to
learn Postgres enough, so I am not sure this will be done this weekend.
I have ideas, only time is needed to implement them or to recognize the
failures.

Thanks,
Laszlo Hornyak



Re: [GENERAL] java stored procedures

От
Laszlo Hornyak
Дата:
Hi!

Sorry, I have time only for short ansvers, it is company time :((.

Gunnar Rønning wrote:

>* Barry Lind <barry@xythos.com> wrote:
>|
>| possible problems with your strategy).  Without knowing what exactly
>| you are thinking of doing it is difficult to comment.
>
>Agreed.
>
Ok, I will try to bring the code here before Monday, or at least some
pieces. It is full of hardcoded constants from my developement
environment. :(


>
>| I am very interested in hearing what your plans are for pl/java.  I
>| think this is a very difficult project, but one that would be very
>| useful and welcome.
>
>I would very much like to hear about the plans myself.
>
I do not see so big difficulities yet, am I so lame? It won`t be easy,
realy, we should keep it simple, at least becouse of me.


thanks,
Laszlo Hornyak


Re: [GENERAL] java stored procedures

От
Barry Lind
Дата:
Laszlo,

I have cc'ed the hackers mail list since that group of developers is
probably better able than I to make suggestions on the best interprocess
communication mechanism to use for this.  See
http://archives2.us.postgresql.org/pgsql-general/2001-12/msg00092.php
for background on this thread.

I also stopped cc'ing the general list, since this is getting too
detailed for most of the members on that list.

Now to your mail:

Laszlo Hornyak wrote:

> Hi!
>
> Barry Lind wrote:
>
>> Does the mechanism you are planning support running any JVM?  In my
>> opionion Kaffe isn't good enough to be widely useful.  I think you
>> should be able to plugin whatever jvm is best on your platform, which
>> will likely be either the Sun or IBM JVMs.
>
>
> Ok, I also had problems with caffe, but it may work. I like it becouse
> it is small (the source is about 6M). As much as I know Java VM`s has a
> somewhat standard native interface called JNI. I use this to start the
> VM, and communicate with it. If you think I should change I will do it,
> but it may take a long time to get the new VM. For then I have to run
> kaffe.
>


This seems like a reasonable approach and should work across different
JVMs.  It would probably be a good experiment to try this with the Sun
or IBM jvm at some point to verify.  What I was afraid of was that you
were hacking the Kaffe code to perform the integration which would limit
this solution to only using Kaffe.


>> Also, can you explain this a little bit more.  How does the jvm
>> process get started? (I would hope that the postgresql server
>> processes would start it when needed, as opposed to requiring that it
>> be started separately.)  How does the jvm access these shared memory
>> structures? Since there aren't any methods in the java API to do such
>> things that I am aware of.
>
>
> JVM does not. 'the java process' does with simple posix calls. I use
> debian potatoe, on any other posix system it should work, on any other
> somewhat posix compatible system it may work, I am not sure...
>
>>
>> I don't understand how you do this in java?  I must not be
>> understanding  something correctly here.
>
>
> My failure.
> The 'java request_handler' is not a java function, it is the C
> call_handler in the Postgres side, that is started when a function of
> language 'pljava' is called.
> I made some failure in my previous mail. At home I named the pl/java
> language pl/pizza (something that is not caffe, but well known enough
> :). The application has two running binaries:
> -pizza (which was called 'java process' last time) This is a small C
> program that uses JNI to start VM and call java methods.
> -plpizza.so the shared object that contains the call_handler function.
>


Just a suggestion:  PL/J might be a good name, since as you probably
know it can't be called pl/java because of the trademark restrictions on
the word 'java'.

I am a little concerned about the stability and complexity of having
this '-pizza' program be responsible for handling the calls on the java
side.  My concern is that this will need to be a multithreaded program
since multiple backends will concurrently be needing to interact with
multiple java threads through this one program.  It might be simpler if
each postgres process directly communicated to a java thread via a tcpip
socket.  Then the "-pizza" program would only need to be responsible for
starting up the jvm and creating java threads and sockets for a postgres
process (it would perform a similar role to postmaster for postgres
client connections).


>
>>
>>
>>> -when java thread receives the signal, it reads the message(s) from
>>> the queue, and starts some actions. When done it tells postgres with
>>> a signal that it is ready, and it can come for its results. This will
>>> be rewritten see below problems.
>>
>>
>>
>>
>> Are signals the best way to accomplish this?
>
>
> I don`t know if it is the best, it is the only way I know :)
> Do you know any other ways?
>


I don't know, but hopefully someone on the hackers list will chip in
here with a comment.


>>
>>>
>>> Threading on the java process side is not done yet, ok, it is not
>>> that hard, I will write it, if it will be realy neccessary.
>>
>>
>> Agreed, this is important.
>>
>> Shouldn't this code use all or most of the logic found in the FE/BE
>> protocol?  Why invent and code another mechanism to transfer data when
>> one already exists.  (I will admit that the current FE/BE mechanism
>> isn't the ideal choice, but it seems easier to reuse what exists for
>> now and improve on it later).
>
>
> Well, I am relatively new to Postgres, and I don`t know these protocols.
> In the weekend I will start to learn it, and in Sunday or Monday I maybe
> I will understand it, if not, next weekend..
>
>>
>> You didn't mention how you plan to deal with the transaction
>> symantics.  So what happens when the pl/java function calls through
>> jdbc back to the server to insert some data?  That should happen in
>> the same transaction as the caller correct?
>
>
> I don`t think this will be a problem, I have ideas for this. Idea mean:
> I know how I will start it, it may be good, or it may be fataly stupid
> idea, it will turn out when I tried it. Simply: The same way plpizza
> tells pizza the request, pizza can talk back to plpizza. This is planed
> to work with similar mechanism I described last time (shm+signals).
>


OK, so the same backend process that called the function gets messaged
to process the sql.  This should work.  However it means you will need a
special version of the jdbc driver that uses this shm+signals
communication mechanism instead of what the current jdbc driver does.
This is something I would be happy to help you with.



Re: [HACKERS] [GENERAL] java stored procedures

От
Hannu Krosing
Дата:
Laszlo Hornyak wrote:

>>
>> | I am very interested in hearing what your plans are for pl/java.  I
>> | think this is a very difficult project, but one that would be very
>> | useful and welcome.
>>
>> I would very much like to hear about the plans myself.
>
> I do not see so big difficulities yet, am I so lame? It won`t be easy,
> realy, we should keep it simple, at least becouse of me.

Let me propose a very different approach to PL/J - use gcc-java and
figure out the problems
with (dynamic) compiling and dynamic linking.

This is an approach somewhat similar to .NET/C# that you first compile
things and then run instead
of trying to do both at the same time ;)

Oracle /may/ be doing something similar with their java stored
procedures, as they claim these to be "compiled".

-----------------
Hannu