Re: Parallel copy

Поиск
Список
Период
Сортировка
От Bharath Rupireddy
Тема Re: Parallel copy
Дата
Msg-id CALj2ACW94icER3WrWapon7JkcX8j0TGRue5ycWMTEvgA3X7fOg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Parallel copy  (vignesh C <vignesh21@gmail.com>)
Ответы Re: Parallel copy  (vignesh C <vignesh21@gmail.com>)
Список pgsql-hackers
Hi,

Thanks Vignesh for reviewing parallel copy for binary format files
patch. I tried to address the comments in the attached patch
(0006-Parallel-Copy-For-Binary-Format-Files.patch).

On Thu, Jun 18, 2020 at 6:42 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Mon, Jun 15, 2020 at 4:39 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > The above tests were run with the configuration attached config.txt, which is the same used for performance tests
ofcsv/text files posted earlier in this mail chain.
 
> >
> > Request the community to take this patch up for review along with the parallel copy for csv/text file patches and
providefeedback.
 
> >
>
> I had reviewed the patch, few comments:
>
>  The new members added should be present in ParallelCopyData

Added to ParallelCopyData.

>
> line_size can be set as and when we process the tuple from
> CopyReadBinaryTupleLeader and this can be set at the end. That way the
> above code can be removed.
>

curr_tuple_start_info and curr_tuple_end_info variables are now local
variables to CopyReadBinaryTupleLeader and the line size calculation
code is moved to CopyReadBinaryAttributeLeader.

>
> curr_block_pos variable is present in ParallelCopyShmInfo, we could
> use it and remove from here.
> curr_data_offset, similar variable raw_buf_index is present in
> CopyStateData, we could use it and remove from here.
>

Yes, making use of them now.

>
> This code is duplicate in CopyReadBinaryTupleLeader &
> CopyReadBinaryAttributeLeader. We could make a function and re-use.
>

Added a new function AdjustFieldInfo.

>
> column_no is not used, it can be removed
>

Removed.

>
> The above code is present in NextCopyFrom & CopyReadBinaryTupleLeader,
> check if we can make a common function or we could use NextCopyFrom as
> it is.
>

Added a macro CHECK_FIELD_COUNT.

> +       if (fld_count == -1)
> +       {
> +               return true;
> +       }
>
> Should this be an assert in CopyReadBinaryTupleWorker function as this
> check is already done in the leader.
>

This check in leader signifies the end of the file. For the workers,
the eof is when GetLinePosition() returns -1.
    line_pos = GetLinePosition(cstate);
    if (line_pos == -1)
        return true;
In case the if (fld_count == -1) is encountered in the worker, workers
should just return true from CopyReadBinaryTupleWorker marking eof.
Having this as an assert doesn't serve the purpose I feel.

Along with the review comments addressed
patch(0006-Parallel-Copy-For-Binary-Format-Files.patch) also attaching
all other latest series of patches(0001 to 0005) from [1], the order
of applying patches is from 0001 to 0006.

[1] https://www.postgresql.org/message-id/CALDaNm0H3N9gK7CMheoaXkO99g%3DuAPA93nSZXu0xDarPyPY6sg%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: Improve handling of parameter differences in physical replication
Следующее
От: Vik Fearing
Дата:
Сообщение: Re: Allow CURRENT_ROLE in GRANTED BY