Обсуждение: BLOBs etc

Поиск

Список

Период

Сортировка

BLOBs etc

От

Sven Köhler

Дата:

07 января 2005 г., 02:25:40

Hi,

on the jdbc-webpages it says, that the JDBC team may decide to change
getBLOBG/setBLOB to support bytea only, and that one should use the
PostGreSQL specific LargeObject extension to acces them.

Well, how should one be abled to use the LargeObjects extension, when
ConnectionPools are used? The Connection object supplied by the
ConnectionPool implementation is usually not the one supplied by the
JDBC driver and therefor the object cannot be casted to PGConnection
anymore. So the LargeObject extension cannot be used anymore like
suggested on the page
http://www.postgresql.org/docs/7.4/static/jdbc-binary-data.html

Is there any kind of support for storing data in oid-columns which will
work with ConnectionPool implementations?

Thx
   Sven

Re: BLOBs etc

От

Oliver Jowett

Дата:

07 января 2005 г., 02:41:15

Sven Köhler wrote:

> Well, how should one be abled to use the LargeObjects extension, when
> ConnectionPools are used? The Connection object supplied by the
> ConnectionPool implementation is usually not the one supplied by the
> JDBC driver and therefor the object cannot be casted to PGConnection
> anymore.

If your connection pool manager uses the driver's
ConnectionPoolDataSource implementation this shouldn't be a problem as
the proxy connections also implement PGconnection.

-O

Re: BLOBs etc

От

Kris Jurka

Дата:

07 января 2005 г., 02:43:05

On Fri, 7 Jan 2005, [ISO-8859-15] Sven Köhler wrote:

> on the jdbc-webpages it says, that the JDBC team may decide to change
> getBLOBG/setBLOB to support bytea only, and that one should use the
> PostGreSQL specific LargeObject extension to acces them.

I'm not sure where it says that, but it is either out of date or just
plain bad advice.  If you could point out where it says that I'd
appreciate it.  I would stick withe the standard Blob interface for now.
There are no plans to change this until better lob support is added to the
server and I don't know of anyone working on that.  Even when
that happens backward compatibility will be provided by a URL parameter.

> Well, how should one be abled to use the LargeObjects extension, when
> ConnectionPools are used? The Connection object supplied by the
> ConnectionPool implementation is usually not the one supplied by the
> JDBC driver and therefor the object cannot be casted to PGConnection
> anymore. So the LargeObject extension cannot be used anymore like
> suggested on the page

That depends on your connection pool.  For example jakarta's dbcp allows
access to the underlying connection like this:

PoolingDataSource dataSource = ...;
dataSource.setAccessToUnderlyingConnectionAllowed(true);

Connection conn = dataSource.getConnection();
Connection realConn = ((DelegatingConnection)conn).getInnermostDelegate();
PGConnection pgConn = (PGConnection)realConn;

Kris Jurka

Re: BLOBs etc

От

Sven Köhler

Дата:

07 января 2005 г., 03:03:35

>>on the jdbc-webpages it says, that the JDBC team may decide to change
>>getBLOBG/setBLOB to support bytea only, and that one should use the
>>PostGreSQL specific LargeObject extension to acces them.
>
> I'm not sure where it says that, but it is either out of date or just
> plain bad advice.  If you could point out where it says that I'd
> appreciate it.  I would stick withe the standard Blob interface for now.

http://www.postgresql.org/docs/7.4/static/jdbc-binary-data.html

There is says:
Note:  In a future release of the JDBC driver, the getBLOB() and
setBLOB() methods may no longer interact with Large Objects and will
instead work on the data type bytea. So it is recommended that you use
the LargeObject API if you intend to use Large Objects.


BTW: why don't getInputStream/getBLOB/... work for _both_, bytea and
oid? Shouldn't the JDBC driver be abled to determine the used datatype
and act accordingly? I'm sure you had you reasons. I'd just like to know
them.

Re: BLOBs etc

От

Kris Jurka

Дата:

07 января 2005 г., 03:16:45

On Fri, 7 Jan 2005, [ISO-8859-15] Sven K�hler wrote:

> >>on the jdbc-webpages it says, that the JDBC team may decide to change
> >>getBLOBG/setBLOB to support bytea only, and that one should use the
> >>PostGreSQL specific LargeObject extension to acces them.
> >
> > I'm not sure where it says that, but it is either out of date or just
> > plain bad advice.  If you could point out where it says that I'd
> > appreciate it.  I would stick withe the standard Blob interface for now.
>
> http://www.postgresql.org/docs/7.4/static/jdbc-binary-data.html

The 7.4 docs do not really get updated any more, but I will make a change
to this in the 8.0 docs.  Thanks.

> BTW: why don't getInputStream/getBLOB/... work for _both_, bytea and
> oid? Shouldn't the JDBC driver be abled to determine the used datatype
> and act accordingly? I'm sure you had you reasons. I'd just like to know
> them.
>

It certainly could do that.  I'm guessing that it wasn't done to
maintain symmetry with the set methods, knowing that only one can
work.  Making getInputStream work on oid would be easy, but making
getBlob work on bytea would be more work to write a wrapper.  There is
also no real reason to use getBlob on bytea because no streaming is
supported.

Do you think the increased flexiblity is worth the potential for
confusion when the corresponding set method doesn't work?

Kris Jurka

Re: BLOBs etc

От

Sven Köhler

Дата:

07 января 2005 г., 03:46:37

>>BTW: why don't getInputStream/getBLOB/... work for _both_, bytea and
>>oid? Shouldn't the JDBC driver be abled to determine the used datatype
>>and act accordingly? I'm sure you had you reasons. I'd just like to know
>>them.
>
> It certainly could do that.  I'm guessing that it wasn't done to
> maintain symmetry with the set methods, knowing that only one can
> work.  Making getInputStream work on oid would be easy, but making
> getBlob work on bytea would be more work to write a wrapper.  There is
> also no real reason to use getBlob on bytea because no streaming is
> supported.
>
> Do you think the increased flexiblity is worth the potential for
> confusion when the corresponding set method doesn't work?

Ohh, well. The "getInputStream/getBLOB/..." above was ment to include
the set-Methods. After all you said, it seems reasonable not to
impelement set/getBlob() for bytea.

On the other hand, it may lower compatibility of the JDBC driver to
generic applications, if there is a datatype that either set/getBlob()
or set/getBinaryStream() doesn't work for. So one should avoid using bytea.

But am i right that set/getBinaryStream won't work for oid columns? I
think delegating set/getBinaryStream() to set/getBlob() is easy if the
JDBC driver would know when to do it. Would that be possible? That would
make oid columns the most compatible as thay would allow to use both:
the binarystream methods and the blob methods. The compatibility of the
JDBC could be improved by that.

Re: BLOBs etc

От

Kris Jurka

Дата:

07 января 2005 г., 04:04:24

On Fri, 7 Jan 2005, [ISO-8859-15] Sven Köhler wrote:

> Ohh, well. The "getInputStream/getBLOB/..." above was ment to include
> the set-Methods. After all you said, it seems reasonable not to
> impelement set/getBlob() for bytea.

The JDBC driver doesn't know what the target table looks like.  It must
blindly send data and hope it matches.  This is why the set methods can
only work for one type while the get methods could work for both.

Kris Jurka

Re: BLOBs etc

От

Sven Köhler

Дата:

07 января 2005 г., 04:20:48

> The JDBC driver doesn't know what the target table looks like.  It must
> blindly send data and hope it matches.  This is why the set methods can
> only work for one type while the get methods could work for both.

Is this going to be improved? Either by using serverside prepared
statements or by changing the server's bahaviour somehow?

Re: BLOBs etc

От

Kris Jurka

Дата:

07 января 2005 г., 04:39:39

On Fri, 7 Jan 2005, [ISO-8859-15] Sven Köhler wrote:

> > The JDBC driver doesn't know what the target table looks like.  It must
> > blindly send data and hope it matches.  This is why the set methods can
> > only work for one type while the get methods could work for both.
>
> Is this going to be improved? Either by using serverside prepared
> statements or by changing the server's bahaviour somehow?
>

No, this was actually a design decision.  It is possible to determine the
expected data type in many cases, but the downside is that it requires a
network roundtrip to the server.  For simple statements this has the
potential to nearly double execution time, so we don't want to do that.

Could we perhaps do this for prepared statements we expect to reuse?  We
could, but then you've introduced an odd inconsistency where sometimes
things will work and sometimes they won't.

Kris Jurka

Re: BLOBs etc

От

Sven Köhler

Дата:

07 января 2005 г., 05:20:20

>>>The JDBC driver doesn't know what the target table looks like.  It must
>>>blindly send data and hope it matches.  This is why the set methods can
>>>only work for one type while the get methods could work for both.
>>
>>Is this going to be improved? Either by using serverside prepared
>>statements or by changing the server's bahaviour somehow?
>
> No, this was actually a design decision.  It is possible to determine the
> expected data type in many cases, but the downside is that it requires a
> network roundtrip to the server.  For simple statements this has the
> potential to nearly double execution time, so we don't want to do that.

I see the dilemma.

> Could we perhaps do this for prepared statements we expect to reuse?  We
> could, but then you've introduced an odd inconsistency where sometimes
> things will work and sometimes they won't.

An inconsistency is not tolerable.

But still the postgresql server could accept the data generated by the
JDBC-driver's "setBinaryStream()" even for oid columns. Isn't that the
missing piece to make set/getBinaryStream() methods work for oid columns?

Is it known how other JDBC drivers handle this problems? Do they only
implements set/getBinaryStream() or set/getBlob()? I'd expect
set/getBinaryStream() to work at last, since it is the most simple way
to get the data. I don't want to bother you any longer, if other drivers
aren't any better, but it seems to me, like there's no unique way to get
binary data from a database via JDBC.

Re: BLOBs etc

От

Kris Jurka

Дата:

07 января 2005 г., 05:33:51

On Fri, 7 Jan 2005, [ISO-8859-15] Sven Köhler wrote:

> But still the postgresql server could accept the data generated by the
> JDBC-driver's "setBinaryStream()" even for oid columns. Isn't that the
> missing piece to make set/getBinaryStream() methods work for oid columns?

That would work, but it's not going to happen.  Setting up and using a
large object is completely different than just stashing data in a bytea
field.  Convincing the server to do simple conversions is difficult
enough, getting it to do something of this magnitude is out of the
question.

> Is it known how other JDBC drivers handle this problems?

The real problem here is that pg has two binary data types that work quite
differently and each have significant strengths/weaknesses.  Other
databases don't have this predicament.  If we only had one or one was
clearly superior or they used a remotely similar API we'd be set.

Kris Jurka

Re: BLOBs etc

От

Kris Jurka

Дата:

07 января 2005 г., 09:07:18

On Fri, 7 Jan 2005, Thomas Hallgren wrote:

> Kris Jurka wrote:
> > That would work, but it's not going to happen.  Setting up and using a
> > large object is completely different than just stashing data in a bytea
> > field.  Convincing the server to do simple conversions is difficult
> > enough, getting it to do something of this magnitude is out of the
> > question.
> >
> Why? After all, both types represent a sequence of bytes so it's not
> really doing a conversion. It's just allowing multiple ways of accessing it.
>
> I'm not sure it would be that much work to actaully make the server
> accept both "by value" and "by stream" semantics for both types.
> Implemented correctly, all clients (not just JDBC) could benefit.
>

I'm not clear what your advocating in concrete terms.  A new pseudo type
"binary data" that could be used until it needs to convert it into a
concrete type?  When does this conversion have to happen?  Consider cases
like:

SELECT ? = ?;

SELECT myfunc(?);

INSERT INTO mytab SELECT * FROM (SELECT 1 UNION SELECT 2) t1(a) LEFT
JOIN (SELECT 1, ?) t2(a) on (t1.a = t2.a);

In the abstract, sure they're both just streams of bytes, but I think when
you have to actually handle this in the server you'll find they are not so
similar.

Note also that when streaming a large object to the server, you are
streaming it directly into permanent storage a chunk at a time so not much
memory is used.  If you don't know where it is going you can't stream it
to it's destination and you've got to either put it into a temporary disk
location or keep it in memory.

Kris Jurka

Re: BLOBs etc

От

Thomas Hallgren

Дата:

07 января 2005 г., 11:21:49

Kris Jurka wrote:
> That would work, but it's not going to happen.  Setting up and using a
> large object is completely different than just stashing data in a bytea
> field.  Convincing the server to do simple conversions is difficult
> enough, getting it to do something of this magnitude is out of the
> question.
>
Why? After all, both types represent a sequence of bytes so it's not
really doing a conversion. It's just allowing multiple ways of accessing it.

I'm not sure it would be that much work to actaully make the server
accept both "by value" and "by stream" semantics for both types.
Implemented correctly, all clients (not just JDBC) could benefit.

Regards,
Thomas Hallgren

Re: BLOBs etc

От

Thomas Hallgren

Дата:

07 января 2005 г., 16:55:40

Kris
> I'm not clear what your advocating in concrete terms.  A new pseudo type
> "binary data" that could be used until it needs to convert it into a
> concrete type?  When does this conversion have to happen?
>
Couldn't this be limited to the client/server protocol? There are 4
cases that will fail today. If you attempt to:

1. stream lob data from the client for storage in a bytea.
2. send a bytea data "by value" for storage in a LOB.
3. read LOB data that is passed as a bytea.
4. read bytea data that is really a LOB.

As you pointed out earlier, it should be possible to make the client
handle cases #3 and #4 so what's left for the server is to deal with #1
and #2. That doesn't sound like rocket science to me.

Regards,
Thomas Hallgren

Re: BLOBs etc

От

Kris Jurka

Дата:

07 января 2005 г., 20:03:39


On Fri, 7 Jan 2005, Thomas Hallgren wrote:

> Kris
> > I'm not clear what your advocating in concrete terms.  A new pseudo type
> > "binary data" that could be used until it needs to convert it into a
> > concrete type?  When does this conversion have to happen?
> >
> Couldn't this be limited to the client/server protocol? There are 4
> cases that will fail today. If you attempt to:
>
> 1. stream lob data from the client for storage in a bytea.
> 2. send a bytea data "by value" for storage in a LOB.
> 3. read LOB data that is passed as a bytea.
> 4. read bytea data that is really a LOB.
>
> As you pointed out earlier, it should be possible to make the client
> handle cases #3 and #4 so what's left for the server is to deal with #1
> and #2. That doesn't sound like rocket science to me.
>

Well you've summarized the problem, but you certainly haven't given any
concrete suggestions on how this could actually be done.  I'm not sure
this even qualifies as handwaving.

Kris Jurka

Re: BLOBs etc

От

Oliver Jowett

Дата:

07 января 2005 г., 21:58:55

Thomas Hallgren wrote:
> Kris Jurka wrote:
>
>> That would work, but it's not going to happen.  Setting up and using a
>> large object is completely different than just stashing data in a bytea
>> field.  Convincing the server to do simple conversions is difficult
>> enough, getting it to do something of this magnitude is out of the
>> question.
>>
> Why? After all, both types represent a sequence of bytes so it's not
> really doing a conversion. It's just allowing multiple ways of accessing
> it.

The "different ways" involve different transaction semantics. You can't
currently deal with LOBs when autocommit is on. How would you deal with
that?

-O

Re: BLOBs etc

От

Thomas Hallgren

Дата:

07 января 2005 г., 22:19:43

Kris,

> Well you've summarized the problem, but you certainly haven't given any
> concrete suggestions on how this could actually be done.  I'm not sure
> this even qualifies as handwaving.
>
Hmm, handwaving or not, your statement "getting it to do something of
this magnitude is out of the question" perhaps triggerd a certain urge
to define what "something" really is about.

My intention was to "summarize the problem" and show that it isn't that
big a deal. I don't think it would take that much of an imagination to
figure out what needs to be done.

Regards,
Thomas Hallgren

Re: BLOBs etc

От

Thomas Hallgren

Дата:

07 января 2005 г., 22:25:58

Oliver Jowett wrote:

> The "different ways" involve different transaction semantics. You can't
> currently deal with LOBs when autocommit is on. How would you deal with
> that?
>
I'm aware that the transaction semantics are somewhat different and I've
  proposed a solution for that on the hackers list. Please read my
proposal in thread "Status of the server side Large Object support".

Regards,
Thomas Hallgren

Re: BLOBs etc

От

Oliver Jowett

Дата:

08 января 2005 г., 09:22:57

Thomas Hallgren wrote:
> Oliver Jowett wrote:
>
>> The "different ways" involve different transaction semantics. You
>> can't currently deal with LOBs when autocommit is on. How would you
>> deal with that?
>>
> I'm aware that the transaction semantics are somewhat different and I've
>  proposed a solution for that on the hackers list. Please read my
> proposal in thread "Status of the server side Large Object support".

The thread in question is a month old; here's an archive link for the
benefit of the list:

   http://archives.postgresql.org/pgsql-hackers/2004-11/msg01018.php

The followups seemed to be along the lines of "use EXTERNAL storage and
improve bytea to support partial updates". Is your proposal a serious
candidate for merging?

-O

Re: BLOBs etc

От

Thomas Hallgren

Дата:

08 января 2005 г., 10:56:05

Oliver Jowett wrote:
> Thomas Hallgren wrote:
>
> The thread in question is a month old; here's an archive link for the
> benefit of the list:
>
>   http://archives.postgresql.org/pgsql-hackers/2004-11/msg01018.php
>
Yes, that's the thread, for some reason I wasn't able to read the
archives last night. The posting I was referring to is this one:

http://archives.postgresql.org/pgsql-hackers/2004-11/msg01106.php

> The followups seemed to be along the lines of "use EXTERNAL storage and
> improve bytea to support partial updates". Is your proposal a serious
> candidate for merging?
>
I didn't receive any comments on that proposal so nobody seemed to be
particularly interested. Should the question come up again at a
convenient time, and if some more people would provide feedback and
perhaps help out, then yes, I might consider to actually contribute what
I propose.

I did some more thinking about transaction semantics and auto-commit.
The above link will not really solve what you're after I think? But I'm
not sure that the auto-commit differences will pose a problem.

Let's assume that you have a bytea in a table and that you want to store
something in that table using a Blob or binary stream in the client. The
way I see it, this can be done in one of two ways.

1. You prepare a statement, you do a setBinaryStream, and then an
executeUpdate.
This is only one single statement so the auto-commit setting shouldn't
really make a difference. The server must be changed to accommodate that
it receives stream data instead of a bytea, but that's basically it.

2. You do a select and use a getBlob() on the ResultSet. The JDBC driver
that reads the result will conclude that the column is actually a bytea
so it will treat it as such and wrap it in a special Blob implementation
in the client. The Blob instance must contain a primary key, a column
identifier, and the data.
If auto-commit is true, then each change made to this Blob using
setBytes or writes through a stream obtained using getOutputStream would
be committed immediately. Not very efficient but it would work. Another
possibility is of course to consider this illegal.

Now assume that it is the other way around. The database contains a LOB
and you want to change it using setBytes. Again, this is a single
statement so the auto-commit shouldn't make a difference. The server
must be able to take the bytea data it gets when it was expecting stream
data and transform that into a LOB access.

Do you see more issues with auto-commit?

Regards,
Thomas Hallgren

Re: BLOBs etc

От

Oliver Jowett

Дата:

08 января 2005 г., 21:05:37

Thomas Hallgren wrote:

> The server
> must be able to take the bytea data it gets when it was expecting stream
> data and transform that into a LOB access.

Is doing this transformation work really worth it? If you're going to do
non-trivial server changes, aren't you better off improving bytea to
solve the problems it has with large binary data and using bytea everywhere?

-O

Re: BLOBs etc

От

Thomas Hallgren

Дата:

08 января 2005 г., 22:21:38

Oliver Jowett wrote:
> Is doing this transformation work really worth it? If you're going to do
> non-trivial server changes, aren't you better off improving bytea to
> solve the problems it has with large binary data and using bytea
> everywhere?
>
Probably. My LOB change suggestion was more based on keeping things
simple and not change the server much. As you point out, merging that
with a more significant protocol change will impose more changes anyhow
so why not go the full route and get rid of LOB's as the stand today
altogether.

In any case, a good start would be to allow stream access to a bytea on
the server side so that the client can use Blob access and binary streams.

A related topic is of course text and Clob's. It would be nice if the
could be handled the same way with the addition of needed encoding support.

Regards,
Thomas Hallgren

Re: BLOBs etc

От

Sven Köhler

Дата:

09 января 2005 г., 04:19:19

>> I'm not clear what your advocating in concrete terms.  A new pseudo
>> type "binary data" that could be used until it needs to convert it
>> into a concrete type?  When does this conversion have to happen?
>
> Couldn't this be limited to the client/server protocol? There are 4
> cases that will fail today. If you attempt to:
>
> 1. stream lob data from the client for storage in a bytea.
> 2. send a bytea data "by value" for storage in a LOB.
> 3. read LOB data that is passed as a bytea.
> 4. read bytea data that is really a LOB.
>
> As you pointed out earlier, it should be possible to make the client
> handle cases #3 and #4 so what's left for the server is to deal with #1
> and #2. That doesn't sound like rocket science to me.

I see that i'm not alone :-)

I would already be satisfied, if case #2 would be handled by the server,
and #4 would be handled by the client. That would make the LOB ths most
compatible datatype. The difference between bytea and LOBs may be a bit
confusing sometimes, but well, i don't see a problem with semantics. The
way to access the binary data (with autocommit on or off) is a forced by
the column's datatype, and not by the Java-method which is called.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: BLOBs etc