Обсуждение: BLOBs etc
Hi, on the jdbc-webpages it says, that the JDBC team may decide to change getBLOBG/setBLOB to support bytea only, and that one should use the PostGreSQL specific LargeObject extension to acces them. Well, how should one be abled to use the LargeObjects extension, when ConnectionPools are used? The Connection object supplied by the ConnectionPool implementation is usually not the one supplied by the JDBC driver and therefor the object cannot be casted to PGConnection anymore. So the LargeObject extension cannot be used anymore like suggested on the page http://www.postgresql.org/docs/7.4/static/jdbc-binary-data.html Is there any kind of support for storing data in oid-columns which will work with ConnectionPool implementations? Thx Sven
Sven Köhler wrote: > Well, how should one be abled to use the LargeObjects extension, when > ConnectionPools are used? The Connection object supplied by the > ConnectionPool implementation is usually not the one supplied by the > JDBC driver and therefor the object cannot be casted to PGConnection > anymore. If your connection pool manager uses the driver's ConnectionPoolDataSource implementation this shouldn't be a problem as the proxy connections also implement PGconnection. -O
On Fri, 7 Jan 2005, [ISO-8859-15] Sven Köhler wrote: > on the jdbc-webpages it says, that the JDBC team may decide to change > getBLOBG/setBLOB to support bytea only, and that one should use the > PostGreSQL specific LargeObject extension to acces them. I'm not sure where it says that, but it is either out of date or just plain bad advice. If you could point out where it says that I'd appreciate it. I would stick withe the standard Blob interface for now. There are no plans to change this until better lob support is added to the server and I don't know of anyone working on that. Even when that happens backward compatibility will be provided by a URL parameter. > Well, how should one be abled to use the LargeObjects extension, when > ConnectionPools are used? The Connection object supplied by the > ConnectionPool implementation is usually not the one supplied by the > JDBC driver and therefor the object cannot be casted to PGConnection > anymore. So the LargeObject extension cannot be used anymore like > suggested on the page That depends on your connection pool. For example jakarta's dbcp allows access to the underlying connection like this: PoolingDataSource dataSource = ...; dataSource.setAccessToUnderlyingConnectionAllowed(true); Connection conn = dataSource.getConnection(); Connection realConn = ((DelegatingConnection)conn).getInnermostDelegate(); PGConnection pgConn = (PGConnection)realConn; Kris Jurka
>>on the jdbc-webpages it says, that the JDBC team may decide to change >>getBLOBG/setBLOB to support bytea only, and that one should use the >>PostGreSQL specific LargeObject extension to acces them. > > I'm not sure where it says that, but it is either out of date or just > plain bad advice. If you could point out where it says that I'd > appreciate it. I would stick withe the standard Blob interface for now. http://www.postgresql.org/docs/7.4/static/jdbc-binary-data.html There is says: Note: In a future release of the JDBC driver, the getBLOB() and setBLOB() methods may no longer interact with Large Objects and will instead work on the data type bytea. So it is recommended that you use the LargeObject API if you intend to use Large Objects. BTW: why don't getInputStream/getBLOB/... work for _both_, bytea and oid? Shouldn't the JDBC driver be abled to determine the used datatype and act accordingly? I'm sure you had you reasons. I'd just like to know them.
On Fri, 7 Jan 2005, [ISO-8859-15] Sven K�hler wrote: > >>on the jdbc-webpages it says, that the JDBC team may decide to change > >>getBLOBG/setBLOB to support bytea only, and that one should use the > >>PostGreSQL specific LargeObject extension to acces them. > > > > I'm not sure where it says that, but it is either out of date or just > > plain bad advice. If you could point out where it says that I'd > > appreciate it. I would stick withe the standard Blob interface for now. > > http://www.postgresql.org/docs/7.4/static/jdbc-binary-data.html The 7.4 docs do not really get updated any more, but I will make a change to this in the 8.0 docs. Thanks. > BTW: why don't getInputStream/getBLOB/... work for _both_, bytea and > oid? Shouldn't the JDBC driver be abled to determine the used datatype > and act accordingly? I'm sure you had you reasons. I'd just like to know > them. > It certainly could do that. I'm guessing that it wasn't done to maintain symmetry with the set methods, knowing that only one can work. Making getInputStream work on oid would be easy, but making getBlob work on bytea would be more work to write a wrapper. There is also no real reason to use getBlob on bytea because no streaming is supported. Do you think the increased flexiblity is worth the potential for confusion when the corresponding set method doesn't work? Kris Jurka
>>BTW: why don't getInputStream/getBLOB/... work for _both_, bytea and >>oid? Shouldn't the JDBC driver be abled to determine the used datatype >>and act accordingly? I'm sure you had you reasons. I'd just like to know >>them. > > It certainly could do that. I'm guessing that it wasn't done to > maintain symmetry with the set methods, knowing that only one can > work. Making getInputStream work on oid would be easy, but making > getBlob work on bytea would be more work to write a wrapper. There is > also no real reason to use getBlob on bytea because no streaming is > supported. > > Do you think the increased flexiblity is worth the potential for > confusion when the corresponding set method doesn't work? Ohh, well. The "getInputStream/getBLOB/..." above was ment to include the set-Methods. After all you said, it seems reasonable not to impelement set/getBlob() for bytea. On the other hand, it may lower compatibility of the JDBC driver to generic applications, if there is a datatype that either set/getBlob() or set/getBinaryStream() doesn't work for. So one should avoid using bytea. But am i right that set/getBinaryStream won't work for oid columns? I think delegating set/getBinaryStream() to set/getBlob() is easy if the JDBC driver would know when to do it. Would that be possible? That would make oid columns the most compatible as thay would allow to use both: the binarystream methods and the blob methods. The compatibility of the JDBC could be improved by that.
On Fri, 7 Jan 2005, [ISO-8859-15] Sven Köhler wrote: > Ohh, well. The "getInputStream/getBLOB/..." above was ment to include > the set-Methods. After all you said, it seems reasonable not to > impelement set/getBlob() for bytea. The JDBC driver doesn't know what the target table looks like. It must blindly send data and hope it matches. This is why the set methods can only work for one type while the get methods could work for both. Kris Jurka
> The JDBC driver doesn't know what the target table looks like. It must > blindly send data and hope it matches. This is why the set methods can > only work for one type while the get methods could work for both. Is this going to be improved? Either by using serverside prepared statements or by changing the server's bahaviour somehow?
On Fri, 7 Jan 2005, [ISO-8859-15] Sven Köhler wrote: > > The JDBC driver doesn't know what the target table looks like. It must > > blindly send data and hope it matches. This is why the set methods can > > only work for one type while the get methods could work for both. > > Is this going to be improved? Either by using serverside prepared > statements or by changing the server's bahaviour somehow? > No, this was actually a design decision. It is possible to determine the expected data type in many cases, but the downside is that it requires a network roundtrip to the server. For simple statements this has the potential to nearly double execution time, so we don't want to do that. Could we perhaps do this for prepared statements we expect to reuse? We could, but then you've introduced an odd inconsistency where sometimes things will work and sometimes they won't. Kris Jurka
>>>The JDBC driver doesn't know what the target table looks like. It must >>>blindly send data and hope it matches. This is why the set methods can >>>only work for one type while the get methods could work for both. >> >>Is this going to be improved? Either by using serverside prepared >>statements or by changing the server's bahaviour somehow? > > No, this was actually a design decision. It is possible to determine the > expected data type in many cases, but the downside is that it requires a > network roundtrip to the server. For simple statements this has the > potential to nearly double execution time, so we don't want to do that. I see the dilemma. > Could we perhaps do this for prepared statements we expect to reuse? We > could, but then you've introduced an odd inconsistency where sometimes > things will work and sometimes they won't. An inconsistency is not tolerable. But still the postgresql server could accept the data generated by the JDBC-driver's "setBinaryStream()" even for oid columns. Isn't that the missing piece to make set/getBinaryStream() methods work for oid columns? Is it known how other JDBC drivers handle this problems? Do they only implements set/getBinaryStream() or set/getBlob()? I'd expect set/getBinaryStream() to work at last, since it is the most simple way to get the data. I don't want to bother you any longer, if other drivers aren't any better, but it seems to me, like there's no unique way to get binary data from a database via JDBC.
On Fri, 7 Jan 2005, [ISO-8859-15] Sven Köhler wrote: > But still the postgresql server could accept the data generated by the > JDBC-driver's "setBinaryStream()" even for oid columns. Isn't that the > missing piece to make set/getBinaryStream() methods work for oid columns? That would work, but it's not going to happen. Setting up and using a large object is completely different than just stashing data in a bytea field. Convincing the server to do simple conversions is difficult enough, getting it to do something of this magnitude is out of the question. > Is it known how other JDBC drivers handle this problems? The real problem here is that pg has two binary data types that work quite differently and each have significant strengths/weaknesses. Other databases don't have this predicament. If we only had one or one was clearly superior or they used a remotely similar API we'd be set. Kris Jurka
On Fri, 7 Jan 2005, Thomas Hallgren wrote: > Kris Jurka wrote: > > That would work, but it's not going to happen. Setting up and using a > > large object is completely different than just stashing data in a bytea > > field. Convincing the server to do simple conversions is difficult > > enough, getting it to do something of this magnitude is out of the > > question. > > > Why? After all, both types represent a sequence of bytes so it's not > really doing a conversion. It's just allowing multiple ways of accessing it. > > I'm not sure it would be that much work to actaully make the server > accept both "by value" and "by stream" semantics for both types. > Implemented correctly, all clients (not just JDBC) could benefit. > I'm not clear what your advocating in concrete terms. A new pseudo type "binary data" that could be used until it needs to convert it into a concrete type? When does this conversion have to happen? Consider cases like: SELECT ? = ?; SELECT myfunc(?); INSERT INTO mytab SELECT * FROM (SELECT 1 UNION SELECT 2) t1(a) LEFT JOIN (SELECT 1, ?) t2(a) on (t1.a = t2.a); In the abstract, sure they're both just streams of bytes, but I think when you have to actually handle this in the server you'll find they are not so similar. Note also that when streaming a large object to the server, you are streaming it directly into permanent storage a chunk at a time so not much memory is used. If you don't know where it is going you can't stream it to it's destination and you've got to either put it into a temporary disk location or keep it in memory. Kris Jurka
Kris Jurka wrote: > That would work, but it's not going to happen. Setting up and using a > large object is completely different than just stashing data in a bytea > field. Convincing the server to do simple conversions is difficult > enough, getting it to do something of this magnitude is out of the > question. > Why? After all, both types represent a sequence of bytes so it's not really doing a conversion. It's just allowing multiple ways of accessing it. I'm not sure it would be that much work to actaully make the server accept both "by value" and "by stream" semantics for both types. Implemented correctly, all clients (not just JDBC) could benefit. Regards, Thomas Hallgren
Kris > I'm not clear what your advocating in concrete terms. A new pseudo type > "binary data" that could be used until it needs to convert it into a > concrete type? When does this conversion have to happen? > Couldn't this be limited to the client/server protocol? There are 4 cases that will fail today. If you attempt to: 1. stream lob data from the client for storage in a bytea. 2. send a bytea data "by value" for storage in a LOB. 3. read LOB data that is passed as a bytea. 4. read bytea data that is really a LOB. As you pointed out earlier, it should be possible to make the client handle cases #3 and #4 so what's left for the server is to deal with #1 and #2. That doesn't sound like rocket science to me. Regards, Thomas Hallgren
On Fri, 7 Jan 2005, Thomas Hallgren wrote: > Kris > > I'm not clear what your advocating in concrete terms. A new pseudo type > > "binary data" that could be used until it needs to convert it into a > > concrete type? When does this conversion have to happen? > > > Couldn't this be limited to the client/server protocol? There are 4 > cases that will fail today. If you attempt to: > > 1. stream lob data from the client for storage in a bytea. > 2. send a bytea data "by value" for storage in a LOB. > 3. read LOB data that is passed as a bytea. > 4. read bytea data that is really a LOB. > > As you pointed out earlier, it should be possible to make the client > handle cases #3 and #4 so what's left for the server is to deal with #1 > and #2. That doesn't sound like rocket science to me. > Well you've summarized the problem, but you certainly haven't given any concrete suggestions on how this could actually be done. I'm not sure this even qualifies as handwaving. Kris Jurka
Thomas Hallgren wrote: > Kris Jurka wrote: > >> That would work, but it's not going to happen. Setting up and using a >> large object is completely different than just stashing data in a bytea >> field. Convincing the server to do simple conversions is difficult >> enough, getting it to do something of this magnitude is out of the >> question. >> > Why? After all, both types represent a sequence of bytes so it's not > really doing a conversion. It's just allowing multiple ways of accessing > it. The "different ways" involve different transaction semantics. You can't currently deal with LOBs when autocommit is on. How would you deal with that? -O
Kris, > Well you've summarized the problem, but you certainly haven't given any > concrete suggestions on how this could actually be done. I'm not sure > this even qualifies as handwaving. > Hmm, handwaving or not, your statement "getting it to do something of this magnitude is out of the question" perhaps triggerd a certain urge to define what "something" really is about. My intention was to "summarize the problem" and show that it isn't that big a deal. I don't think it would take that much of an imagination to figure out what needs to be done. Regards, Thomas Hallgren
Oliver Jowett wrote: > The "different ways" involve different transaction semantics. You can't > currently deal with LOBs when autocommit is on. How would you deal with > that? > I'm aware that the transaction semantics are somewhat different and I've proposed a solution for that on the hackers list. Please read my proposal in thread "Status of the server side Large Object support". Regards, Thomas Hallgren
Thomas Hallgren wrote: > Oliver Jowett wrote: > >> The "different ways" involve different transaction semantics. You >> can't currently deal with LOBs when autocommit is on. How would you >> deal with that? >> > I'm aware that the transaction semantics are somewhat different and I've > proposed a solution for that on the hackers list. Please read my > proposal in thread "Status of the server side Large Object support". The thread in question is a month old; here's an archive link for the benefit of the list: http://archives.postgresql.org/pgsql-hackers/2004-11/msg01018.php The followups seemed to be along the lines of "use EXTERNAL storage and improve bytea to support partial updates". Is your proposal a serious candidate for merging? -O
Oliver Jowett wrote: > Thomas Hallgren wrote: > > The thread in question is a month old; here's an archive link for the > benefit of the list: > > http://archives.postgresql.org/pgsql-hackers/2004-11/msg01018.php > Yes, that's the thread, for some reason I wasn't able to read the archives last night. The posting I was referring to is this one: http://archives.postgresql.org/pgsql-hackers/2004-11/msg01106.php > The followups seemed to be along the lines of "use EXTERNAL storage and > improve bytea to support partial updates". Is your proposal a serious > candidate for merging? > I didn't receive any comments on that proposal so nobody seemed to be particularly interested. Should the question come up again at a convenient time, and if some more people would provide feedback and perhaps help out, then yes, I might consider to actually contribute what I propose. I did some more thinking about transaction semantics and auto-commit. The above link will not really solve what you're after I think? But I'm not sure that the auto-commit differences will pose a problem. Let's assume that you have a bytea in a table and that you want to store something in that table using a Blob or binary stream in the client. The way I see it, this can be done in one of two ways. 1. You prepare a statement, you do a setBinaryStream, and then an executeUpdate. This is only one single statement so the auto-commit setting shouldn't really make a difference. The server must be changed to accommodate that it receives stream data instead of a bytea, but that's basically it. 2. You do a select and use a getBlob() on the ResultSet. The JDBC driver that reads the result will conclude that the column is actually a bytea so it will treat it as such and wrap it in a special Blob implementation in the client. The Blob instance must contain a primary key, a column identifier, and the data. If auto-commit is true, then each change made to this Blob using setBytes or writes through a stream obtained using getOutputStream would be committed immediately. Not very efficient but it would work. Another possibility is of course to consider this illegal. Now assume that it is the other way around. The database contains a LOB and you want to change it using setBytes. Again, this is a single statement so the auto-commit shouldn't make a difference. The server must be able to take the bytea data it gets when it was expecting stream data and transform that into a LOB access. Do you see more issues with auto-commit? Regards, Thomas Hallgren
Thomas Hallgren wrote: > The server > must be able to take the bytea data it gets when it was expecting stream > data and transform that into a LOB access. Is doing this transformation work really worth it? If you're going to do non-trivial server changes, aren't you better off improving bytea to solve the problems it has with large binary data and using bytea everywhere? -O
Oliver Jowett wrote: > Is doing this transformation work really worth it? If you're going to do > non-trivial server changes, aren't you better off improving bytea to > solve the problems it has with large binary data and using bytea > everywhere? > Probably. My LOB change suggestion was more based on keeping things simple and not change the server much. As you point out, merging that with a more significant protocol change will impose more changes anyhow so why not go the full route and get rid of LOB's as the stand today altogether. In any case, a good start would be to allow stream access to a bytea on the server side so that the client can use Blob access and binary streams. A related topic is of course text and Clob's. It would be nice if the could be handled the same way with the addition of needed encoding support. Regards, Thomas Hallgren
>> I'm not clear what your advocating in concrete terms. A new pseudo >> type "binary data" that could be used until it needs to convert it >> into a concrete type? When does this conversion have to happen? > > Couldn't this be limited to the client/server protocol? There are 4 > cases that will fail today. If you attempt to: > > 1. stream lob data from the client for storage in a bytea. > 2. send a bytea data "by value" for storage in a LOB. > 3. read LOB data that is passed as a bytea. > 4. read bytea data that is really a LOB. > > As you pointed out earlier, it should be possible to make the client > handle cases #3 and #4 so what's left for the server is to deal with #1 > and #2. That doesn't sound like rocket science to me. I see that i'm not alone :-) I would already be satisfied, if case #2 would be handled by the server, and #4 would be handled by the client. That would make the LOB ths most compatible datatype. The difference between bytea and LOBs may be a bit confusing sometimes, but well, i don't see a problem with semantics. The way to access the binary data (with autocommit on or off) is a forced by the column's datatype, and not by the Java-method which is called.