Обсуждение: pgsql: Our code had: if (c == '\\' &&
Log Message: ----------- Our code had: if (c == '\\' && cstate->line_buf.len == 0) The problem with that is the because of the input and _output_ buffering, cstate->line_buf.len could be zero even if we are not on the first character of a line. In fact, for a typical line, it is zero for all characters on the line. The proper solution is to introduce a boolean, first_char_in_line, that we set as we enter the loop and clear once we process a character. I have restructured the line-reading code in copy.c by: o merging the CSV/non-CSV functions into a single function o used macros to centralize and clarify the buffering code o updated comments o renamed client_encoding_only to encoding_embeds_ascii o added a high-bit test to the encoding_embeds_ascii test for performance o in CSV mode, allow a backslash followed by a non-period to continue being processed as a data value There should be no performance impact from this patch because it is functionally equivalent. If you apply the patch you will see copy.c is much clearer in this area now and might suggest additional optimizations. I have also attached a 8.1-only patch to fix the CSV \. handling bug with no code restructuring. Modified Files: -------------- pgsql/src/backend/commands: copy.c (r1.255 -> r1.256) (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/commands/copy.c.diff?r1=1.255&r2=1.256)
Bruce – have you tested the performance before and after?
- Luke
On 12/27/05 10:10 AM, "momjian@postgresql.org" <momjian@postgresql.org> wrote:
- Luke
On 12/27/05 10:10 AM, "momjian@postgresql.org" <momjian@postgresql.org> wrote:
Log Message:
-----------
Our code had:
if (c == '\\' && cstate->line_buf.len == 0)
The problem with that is the because of the input and _output_
buffering, cstate->line_buf.len could be zero even if we are not on the
first character of a line. In fact, for a typical line, it is zero for
all characters on the line. The proper solution is to introduce a
boolean, first_char_in_line, that we set as we enter the loop and clear
once we process a character.
I have restructured the line-reading code in copy.c by:
o merging the CSV/non-CSV functions into a single function
o used macros to centralize and clarify the buffering code
o updated comments
o renamed client_encoding_only to encoding_embeds_ascii
o added a high-bit test to the encoding_embeds_ascii test for
performance
o in CSV mode, allow a backslash followed by a non-period to
continue being processed as a data value
There should be no performance impact from this patch because it is
functionally equivalent. If you apply the patch you will see copy.c is
much clearer in this area now and might suggest additional
optimizations.
I have also attached a 8.1-only patch to fix the CSV \. handling bug
with no code restructuring.
Modified Files:
--------------
pgsql/src/backend/commands:
copy.c (r1.255 -> r1.256)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/commands/copy.c.diff?r1=1.255=1.256 <http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/commands/copy.c.diff?r1=1.255&r2=1.256> )
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly