Re: Improvements in Copy From

Поиск
Список
Период
Сортировка
От vignesh C
Тема Re: Improvements in Copy From
Дата
Msg-id CALDaNm0HioOL3f5+YjAywWWenDg=azTL8oC=7r56c7zCce12pA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Improvements in Copy From  (Surafel Temesgen <surafel3000@gmail.com>)
Ответы Re: Improvements in Copy From
Список pgsql-hackers
On Mon, Sep 7, 2020 at 1:19 PM Surafel Temesgen <surafel3000@gmail.com> wrote:
>
>
> Hi Vignesh
>
> On Wed, Jul 1, 2020 at 3:46 PM vignesh C <vignesh21@gmail.com> wrote:
>>
>> Hi,
>>
>> While reviewing copy from I identified few  improvements for copy from
>> that can be done :
>> a) copy from stdin copies lesser amount of data to buffer even though
>> space is available in buffer because minread was passed as 1 to
>> CopyGetData, Hence it only reads until the data read from libpq is
>> less than minread. This can be fixed by passing the actual space
>> available in buffer, this reduces the unnecessary frequent calls to
>> CopyGetData.
>
>
> why not applying the same optimization on file read ?

For file read this is already taken care as you can see from below code:
bytesread = fread(databuf, 1, maxread, cstate->copy_file);
if (ferror(cstate->copy_file))
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not read from COPY file: %m")));
if (bytesread == 0)
cstate->reached_eof = true;
break;

We do not have any condition to break unlike the case of stdin, we
read 1 * maxread size of data, So no need to do anything for it.

>
>>
>> c) Copy from reads header line and do nothing for the header line, we
>> need not clear EOL & need not convert to server encoding for the
>> header line.
>
>
> We have a patch for column matching feature [1] that may need a header line to be further processed. Even without
thatI think it is preferable to process the header line for nothing than adding those checks to the loop,
performance-wise.

I had seen that patch, I feel that change to match the header if the
header is specified can be addressed in this patch if that patch gets
committed first or vice versa. We are doing a lot of processing for
the data which we need not do anything. Shouldn't this be skipped if
not required. Similar check is present in NextCopyFromRawFields also
to skip header.

Thoughts?

Regards,
VIgnesh
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: SIGQUIT handling, redux
Следующее
От: Oleg Bartunov
Дата:
Сообщение: Re: Yet another fast GiST build