Обсуждение: Stream Copy for 8.1 - 8.3dev
Stream Copy extension for 8.1 - 8.3dev drivers is available with examples at http://kato.iki.fi/sw/db/postgresql/jdbc/copy/ Cheers, -- Kalle Hallivuori +358-41-5053073 http://korpiq.iki.fi/
On Wed, 27 Jun 2007, Kalle Hallivuori wrote: > Stream Copy extension for 8.1 - 8.3dev drivers is available with examples at > > http://kato.iki.fi/sw/db/postgresql/jdbc/copy/ > Some thoughts after giving this a read (but not a test). 1) Your CopyObjects example pays no attention to encoding or escaping. Calling getBytes() will give you bytes in the JVM's character set which can easily be different than the client_encoding which the driver will always set to UTF-8 (for 7.3 and up). You also have to think about what happens when your data contains a null escape, delimiter, or newline itself. 2) I'm not sure it's helpful to have the copy methods throw both an IO and SQL exception. Why not just wrap any IOExceptions inside a SQLException? 3) What is the purpose of the reuse_buffer parameter? What is the use case for someone wanting a "fresh" buffer every time? 4) buildCopyQuery doesn't handle quoting/escaping of identifiers. By providing something like this you're now responsible for all the "hard" stuff. I would leave it out unless you're prepared to do a lot more thinking about it. It also doesn't handle all the possible copy options. Perhaps you should split this into core copy functionality and a helper class that builds upon it and can provide other useful things (escaping / conversion of java objects to pg datatypes). 5) The coding in QueryExecutorImpl is unsafe because once you get CopyInResponse you loop firing away data without listening for any return data. Consider a table which had a trigger on it which issued a notice for each row it received. If you don't read from the server the server will block and then you'll block sending it data. See the comments in core/v3/QueryExecutorImpl near MAX_BUFFERED_QUERIES for more details. 6) You mix warnings and errors together. They should be kept separate. 7) When you get CopyResponses and the user hasn't provided the appropriate stream you bail leaving the protocol in an unknown state. You should issue a CopyFail and wait for ReadyForQuery so the whole connection isn't lost. Kris Jurka
Hi Kris! 2007/7/16, Kris Jurka <books@ejurka.com>: > > http://kato.iki.fi/sw/db/postgresql/jdbc/copy/ > > Some thoughts after giving this a read (but not a test). Thanks a lot! > 1) Your CopyObjects example pays no attention to encoding or escaping. > Calling getBytes() will give you bytes in the JVM's character set which > can easily be different than the client_encoding which the driver will > always set to UTF-8 (for 7.3 and up). You also have to think about what > happens when your data contains a null escape, delimiter, or newline > itself. Yes, it's very quick and dirty example. I'll look into providing a more thorough mediator class as you propose below. > 2) I'm not sure it's helpful to have the copy methods throw both an IO and > SQL exception. Why not just wrap any IOExceptions inside a SQLException? I thought SQLException means that the transaction failed, but connection is still usable, whereas an IOException leaves the connection in an unknown state so that it must be dropped. But since that is not consistent with the rest of the driver, it's obviously wrong thought :) I'll wrap'em. > 3) What is the purpose of the reuse_buffer parameter? What is the use > case for someone wanting a "fresh" buffer every time? A custom stream might store them internally. Obviously I should document such details before giving the code out for review. > 4) buildCopyQuery doesn't handle quoting/escaping of identifiers. By > providing something like this you're now responsible for all the "hard" > stuff. I would leave it out unless you're prepared to do a lot more > thinking about it. It also doesn't handle all the possible copy options. I'll just drop the query building functionality. It doesn't really offer much value, as the caller has to be familiar with the syntax anyway. > Perhaps you should split this into core copy functionality and a helper > class that builds upon it and can provide other useful things (escaping / > conversion of java objects to pg datatypes). I'll look into providing a couple of helper classes to target these. On one hand I think we should honor the copy philosophy of keeping the raw data as flat as possible. On the other hand, providing a full object interface on top of the former would be definitely sexy :) > 5) The coding in QueryExecutorImpl is unsafe because once you get > CopyInResponse you loop firing away data without listening for any return > data. Consider a table which had a trigger on it which issued a notice > for each row it received. If you don't read from the server the server > will block and then you'll block sending it data. See the comments in > core/v3/QueryExecutorImpl near MAX_BUFFERED_QUERIES for more details. I'll fix that. Thanks! > 6) You mix warnings and errors together. They should be kept separate. Ok, I'll fix that too. > 7) When you get CopyResponses and the user hasn't provided the appropriate > stream you bail leaving the protocol in an unknown state. You should > issue a CopyFail and wait for ReadyForQuery so the whole connection isn't > lost. Oops, that's really lousy of me. Will fix. I'll let you know once I have a fixed version available. Cheers, -- Kalle Hallivuori +358-41-5053073 http://korpiq.iki.fi/
Hi again. Fixed versions of copy patches and drivers are now available at http://kato.iki.fi/sw/db/postgresql/jdbc/copy/ Unit tests do pass, but I had to postpone any more extensive tests for now. 2007/7/16, Kris Jurka <books@ejurka.com>: > 1) Your CopyObjects example pays no attention to encoding or escaping. > Calling getBytes() will give you bytes in the JVM's character set which > can easily be different than the client_encoding which the driver will > always set to UTF-8 (for 7.3 and up). You also have to think about what > happens when your data contains a null escape, delimiter, or newline > itself. Example dropped for now. Once the core implementation is considered acceptable, I'll add documentation and helper classes. > 2) I'm not sure it's helpful to have the copy methods throw both an IO and > SQL exception. Why not just wrap any IOExceptions inside a SQLException? Wrapped as suggested. > 3) What is the purpose of the reuse_buffer parameter? What is the use > case for someone wanting a "fresh" buffer every time? Dropped as unnecessary. > 4) buildCopyQuery doesn't handle quoting/escaping of identifiers. By > providing something like this you're now responsible for all the "hard" > stuff. I would leave it out unless you're prepared to do a lot more > thinking about it. It also doesn't handle all the possible copy options. > Perhaps you should split this into core copy functionality and a helper > class that builds upon it and can provide other useful things (escaping / > conversion of java objects to pg datatypes). Dropped as unnecessary. > 5) The coding in QueryExecutorImpl is unsafe because once you get > CopyInResponse you loop firing away data without listening for any return > data. Consider a table which had a trigger on it which issued a notice > for each row it received. If you don't read from the server the server > will block and then you'll block sending it data. See the comments in > core/v3/QueryExecutorImpl near MAX_BUFFERED_QUERIES for more details. Changed so that data is sent only while the server stays quiet. (Apparently I should write a test for this.) > 6) You mix warnings and errors together. They should be kept separate. Separated warnings from errors. After succesful recovery from copy subprotocol state: - if any errors were catched, they're thrown and warnings get dropped as a side effect; - if any warnings got collected, they're thrown, since I'm unsure about how to handle them. > 7) When you get CopyResponses and the user hasn't provided the appropriate > stream you bail leaving the protocol in an unknown state. You should > issue a CopyFail and wait for ReadyForQuery so the whole connection isn't > lost. Fixed as suggested. -- Kalle Hallivuori +358-41-5053073 http://korpiq.iki.fi/
Kalle Hallivuori wrote: > - if any warnings got collected, they're thrown, since I'm unsure > about how to handle them. Usually SQLWarnings get added to the warning chain of the generating object -- I suppose the Connection in this case. -O
Hi! 2007/7/17, Oliver Jowett <oliver@opencloud.com>: > Usually SQLWarnings get added to the warning chain of the generating > object -- I suppose the Connection in this case. Thanks for the hint. Applied. New version available at http://kato.iki.fi/sw/db/postgresql/jdbc/copy/ Cheers, -- Kalle Hallivuori +358-41-5053073 http://korpiq.iki.fi/