Re: COPY FROM performance improvements

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: COPY FROM performance improvements
Дата
Msg-id 26467.1123362292@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: COPY FROM performance improvements  ("Alon Goldshuv" <agoldshuv@greenplum.com>)
Ответы Re: COPY FROM performance improvements  ("Luke Lonergan" <llonergan@greenplum.com>)
Re: COPY FROM performance improvements  ("Luke Lonergan" <llonergan@greenplum.com>)
Re: COPY FROM performance improvements  ("Luke Lonergan" <llonergan@greenplum.com>)
Список pgsql-patches
"Alon Goldshuv" <agoldshuv@greenplum.com> writes:
> New patch attached. It includes very minor changes. These are changes that
> were committed to CVS 3 weeks ago (copy.c 1.247) which I missed in the
> previous patch.

I've applied this with (rather extensive) revisions.  I didn't like what
you had done with the control structure --- loading the input buffer
only at the outermost loop level was a bad design choice IMHO.  You had
sprinkled the code with an unreasonable number of special cases in order
to try to cope with the effects of that mistake, but there were lots
of problems still left.  Some of the bugs I noticed:

* Broke old-protocol COPY, since that has no provision for stopping at
the EOF marker except by parsing the data carefully to start with.  The
backend would just hang up unless the total data size chanced to be a
multiple of 64K.

* Subtle change in interpretation of \. EOF marker (the existing code
will recognize it even when not at start of line).

* Seems to have thrown away detection of newline format discrepancies.

* Fails for zero-column tables.

* Broke display of column values during error context callback (would
always show the first column contents no matter which one is being
complained of).

* DetectLineEnd mistakenly assumes CR mode if very last character of first
bufferload is CR; need to reserve judgment until next char is available.

* DetectLineEnd fails to account for backslashed control characters,
so it will e.g. accept \ followed by \n as determining the newline
style.

* Fails to apply encoding conversion if first line exceeds copy buf
size, because when DetectLineEnd fails the quick-exit path doesn't do
it.

* There seem to be several bugs associated with the fact that input_buf[]
always has 64K of data in it even when EOF has been reached on the
input.  One example:
    echo -n 123 >zzz1
    psql> create temp table t1(f1 text);
    psql> copy t1 from '/home/tgl/zzz1';
    psql> select * from t1;
hmm ... where'd that 64K of whitespace come from?

I rewrote the patch in a way that retained most of the speedups without
changing the basic control structure (except for replacing multiple
CopyReadAttribute calls with one CopyReadAttributes call per line).

I had some difficulty in generating test cases that weren't largely
I/O-bound, but AFAICT the patch as applied is about the same speed
as what you submitted.

            regards, tom lane

В списке pgsql-patches по дате отправления:

Предыдущее
От: Mark Wong
Дата:
Сообщение: Re: [HACKERS] O_DIRECT for WAL writes
Следующее
От: "Luke Lonergan"
Дата:
Сообщение: Re: COPY FROM performance improvements