Re: [PATCH] Performance Improvement For Copy From Binary Files

Поиск
Список
Период
Сортировка
От Amit Langote
Тема Re: [PATCH] Performance Improvement For Copy From Binary Files
Дата
Msg-id CA+HiwqF02206N8vQnM-C0t74XHce52umVN3HN0Zn+hwXsHpmfA@mail.gmail.com
обсуждение исходный текст
Ответ на [PATCH] Performance Improvement For Copy From Binary Files  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Ответы Re: [PATCH] Performance Improvement For Copy From Binary Files  (Amit Langote <amitlangote09@gmail.com>)
Re: [PATCH] Performance Improvement For Copy From Binary Files  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Список pgsql-hackers
Hi Bharath,

On Mon, Jun 29, 2020 at 2:21 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> For Copy From Binary files, there exists below information for each tuple/row.
> 1. field count(number of columns)
> 2. for every field, field size(column data length)
> 3. field data of field size(actual column data)
>
> Currently, all the above data required at each step is read directly from file using fread() and this happens for all
thetuples/rows.
 
>
> One observation is that in the total execution time of a copy from binary file, the fread() call is taking upto 20%
oftime and the fread() function call count is also too high.
 
>
> For instance, with a dataset of size 5.3GB, 10million tuples with 10 columns,
> total exec time in sec total time taken for fread() fread() function call count
> 101.193 21.33 210000005
> 101.345 21.436 210000005
>
> The total time taken for fread() and the corresponding function call count may increase if we have more number of
columnsfor instance 1000.
 
>
> One solution to this problem is to read data from binary file in RAW_BUF_SIZE(64KB) chunks to avoid repeatedly
callingfread()(thus possibly avoiding few disk IOs). This is similar to the approach followed for csv/text files.
 

I agree that having the buffer in front of the file makes sense,
although we do now have an extra memcpy, that is, from raw_buf to
attribute_buf.data.  Currently, fread() reads directly into
attribute_buf.data.  But maybe that's okay as I don't see the new copy
being all that bad.

> Attaching a patch, implementing the above solution for binary format files.
>
> Below is the improvement gained.
> total exec time in sec total time taken for fread() fread() function call count
> 75.757 2.73 160884
> 75.351 2.742 160884
>
> Execution is 1.36X times faster, fread() time is reduced by 87%, fread() call count is reduced by 99%.
>
> Request the community to take this patch for review if this approach and improvement seem beneficial.
>
> Any suggestions to improve further are most welcome.

Noticed the following misbehaviors when trying to test the patch:

create table foo5 (a text, b text, c text, d text, e text);
insert into foo5 select repeat('a', (random()*100)::int), 'bbb', 'cc',
'd', 'eee' from generate_series(1, 10000000);
copy foo5 to '/tmp/foo5.bin' binary;
truncate foo5;
copy foo5 from '/tmp/foo5.bin' binary;
ERROR:  unexpected EOF in COPY data
CONTEXT:  COPY foo5, line 33, column a

create table bar (a numeric);
insert into bar select sqrt(a) from generate_series(1, 10000) a;
copy bar to '/tmp/bar.bin' binary;
copy bar from '/tmp/bar.bin' binary;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Trying to figure what was going wrong in each of these cases, I found
the new code a bit hard to navigate and debug :(. Here are a couple of
points that I think could have made things a bit easier:

* Avoid spreading the new buffering logic in multiple existing
functions, with similar pieces of code repeated in multiple places.  I
would add a single new function that takes care of the various
buffering details and call it where CopyGetData() is being used
currently.

* You could've reused CopyLoadRawBuffer()'s functionality instead of
reimplementing it.  I also see multiple instances of special case
handling, which often suggests that bugs are lurking.

Considering these points, I came up with the attached patch with a
much smaller footprint.  As far as I can see, it implements the same
basic idea as your patch.  With it, I was able to see an improvement
in loading time consistent with your numbers.  I measured the time of
loading 10 million rows into tables with 5, 10, 20 text columns as
follows:

create table foo5 (a text, b text, c text, d text, e text);
insert into foo5 select repeat('a', (random()*100)::int), 'bbb', 'cc',
'd', 'eee' from generate_series(1, 10000000);
copy foo5 to '/tmp/foo5.bin' binary;
truncate foo5;
copy foo5 from '/tmp/foo5.bin' binary;

create table foo10 (a text, b text, c text, d text, e text, f text, g
text, h text, i text, j text);
insert into foo10 select repeat('a', (random()*100)::int), 'bbb',
'cc', 'd', 'eee', 'f', 'gg', 'hh', 'i', 'jjj' from generate_series(1,
10000000);
copy foo10 to '/tmp/foo10.bin' binary;
truncate foo10;
copy foo10 from '/tmp/foo10.bin' binary;

create table foo20 (a text, b text, c text, d text, e text, f numeric,
g text, h text, i text, j text, k text, l text, m text, n text, o
text, p text, q text, r text, s text, t text);
insert into foo20 select repeat('a', (random()*100)::int), 'bbb',
'cc', 'd', 'eee', '123.456', 'gg', 'hh', 'ii', 'jjjj', 'kkk', 'llll',
'mm', 'n', 'ooooo', 'pppp', 'q', 'rrrrr', 'ss', 'tttttttttttt' from
generate_series(1, 10000000);
copy foo20 to '/tmp/foo20.bin' binary;
truncate foo20;
copy foo20 from '/tmp/foo20.bin' binary;

The median times for the COPY FROM commands above, with and without
the patch, are as follows:

            HEAD    patched
foo5        8.5     6.5
foo10       14      10
foo20       25      18

 A few more points to remember in the future:

* Commenting style:

+       /* If readbytes are lesser than the requested bytes, then initialize the
+       * remaining bytes in the raw_buf to 0. This will be useful for checking
+       * error "received copy data after EOF marker".
+       */

Multi-line comments are started like this:

    /*
     * <Start here>
    */

* As also mentioned above, it's a good idea in general to avoid having
special cases like these in the code:

+       if (cstate->cur_lineno == 1)
        {
-           /* EOF detected (end of file, or protocol-level EOF) */
-           return false;
+           /* This is for the first time, so read in buff size amount
+            * of data from file.
+            */

...

+
+           /* Move bytes can either be 0, 1, or 2. */
+           movebytes = RAW_BUF_SIZE - cstate->raw_buf_index;

...

+       uint8       movebytes = 0;
+
+       /* Move bytes can either be 0, 1, 2, 3 or 4. */
+       movebytes = RAW_BUF_SIZE - cstate->raw_buf_index;

* Please try to make variable names short if you can or follow the
guidelines around long names:

+       int32 remainingbytesincurrdatablock = RAW_BUF_SIZE -
cstate->raw_buf_index;

Maybe, remaining_bytes would've sufficed here, because "in the current
data block" might be clear to most readers by looking at the
surrounding code.

* The above point also helps avoid long code lines that don't fit
within 78 characters, like these:

+       memcpy(&cstate->attribute_buf.data[0],
&cstate->raw_buf[cstate->raw_buf_index],
remainingbytesincurrdatablock);
+
+       if (CopyGetData(cstate,
&cstate->attribute_buf.data[remainingbytesincurrdatablock],
+                   (fld_size - remainingbytesincurrdatablock),
(fld_size - remainingbytesincurrdatablock)) != (fld_size -
remainingbytesincurrdatablock))

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tatsuo Ishii
Дата:
Сообщение: Re: Implementing Incremental View Maintenance
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: Add Information during standby recovery conflicts