CopyManager bug + fix
От | Samuel Gendler |
---|---|
Тема | CopyManager bug + fix |
Дата | |
Msg-id | i2pda0294261005080024t212944a0n41d9159c6336f712@mail.gmail.com обсуждение исходный текст |
Список | pgsql-jdbc |
This would have been a bug report with accompanying patch, but I checked the latest code in CVS and found that the most recent commit on CopyManager fixes the exact problem I was having. Now this email is just search engine bait so that others who suffer the problem will have an easier time finding the solution than I did (nothing useful showed up in the google searches I did). I'll start with a description of the problem: I was seeing several different exceptions when copying data from a Reader, but the most prevalent one was an exception stating "ERROR: invalid byte sequence for encoding "UTF8": 0x00" I spent hours trying to figure out where my code was sticking a NULL character into the stream before finally giving up and looking to see if it was a problem in the driver. As soon as I did, I found code in public long copyIn(final String sql, Reader from, int bufferSize); which was giving an entire buffer to the UTF-8 encoder even though only a few bytes were populated. This would always show up for me only on the last buffer of the file, no matter how large, because the data that was being written into the PipedWriter that was feeding the PipedReader was flowing in fast enough to always deliver a full buffer until the final buffer of the 'file.' I would sometimes get errors about missing columns, too, though those were less frequent - just enough to confuse. If you don't want to grab the latest development code for the driver, the bug is very easily fixed with a change to a single line of code in the postgresql-jdbc-8.4-701.src tarball available from the site. in org/postgresql/copyCopyManager.java, modify line 142 from: byte[] buf = encoding.encode(new String(cbuf)); to byte[] buf = encoding.encode(new String(cbuf, 0, len)); which turns out to be identical to the most frequent commit to that file in CVS. One other thing, however - It is worth mentioning somewhere in the documentation (there appears to be no documentation of the Copy feature in the jdbc driver docs that I could find) that using a Reader instead of InputStream is pretty inefficient because of that encoding step. It copies bytes from the Reader, instantiates a String (which copies the bytes into the string), then extracts a byte buffer from that string, which again copies the bytes. In my case, I had already been very careful to ensure that the data I was sending was encoded correctly, so that encoding step was just a bunch of totally redundant data copies. Even digging into the code, this wouldn't have been totally obvious if I hadn't gone in with a debugger and stepped all the way into that code. It would be worth adding a comment so that someone reading the code might at least realize the problem and move to the InputStream version of the method instead. In my case, it was relatively trivial to switch to Piped Input and Output Streams, since I knew that my data was encoded correctly before I copied it in. An alternative would be to allow the user of the api to specify whether the encoding step should be performed or if the data should be trusted. I may attempt such a patch when I get some spare cycles. Thanks for the otherwise excellent codebase (which built so easily and was very easy to read and comprehend). Had I known the code would be so easy to dig into, I'd probably have found the problem much sooner, but I tend to exhaust every other possible option, even when I suspect a piece of 3rd party code, since it is usually so difficult to comprehend, debug, build, and test 3rd party code. --sam
В списке pgsql-jdbc по дате отправления: