Re: Parallel copy

Поиск

Список

Период

Сортировка

От	vignesh C
Тема	Re: Parallel copy
Дата	19 августа 2020 г. 09:21:29
Msg-id	CALDaNm2QD5yAsMsgZ-Lr1rqGeCxLcQu7CVrS=Jy3AnGWKDS6NA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Parallel copy (Greg Nancarrow <gregn4422@gmail.com>)
Ответы	Re: Parallel copy Re: Parallel copy
Список	pgsql-hackers

Дерево обсуждения

Thanks Greg for reviewing the patch. Please find my thoughts for your comments.

On Mon, Aug 17, 2020 at 9:44 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> Some further comments:
>
> (1) v3-0002-Framework-for-leader-worker-in-parallel-copy.patch
>
> +/*
> + * Each worker will be allocated WORKER_CHUNK_COUNT of records from DSM data
> + * block to process to avoid lock contention. This value should be divisible by
> + * RINGSIZE, as wrap around cases is currently not handled while selecting the
> + * WORKER_CHUNK_COUNT by the worker.
> + */
> +#define WORKER_CHUNK_COUNT 50
>
>
> "This value should be divisible by RINGSIZE" is not a correct
> statement (since obviously 50 is not divisible by 10000).
> It should say something like "This value should evenly divide into
> RINGSIZE", or "RINGSIZE should be a multiple of WORKER_CHUNK_COUNT".
>

Fixed. Changed it to RINGSIZE should be a multiple of WORKER_CHUNK_COUNT.

> (2) v3-0003-Allow-copy-from-command-to-process-data-from-file.patch
>
> (i)
>
> +                       /*
> +                        * If the data is present in current block
> lineInfo. line_size
> +                        * will be updated. If the data is spread
> across the blocks either
>
> Somehow a space has been put between "lineinfo." and "line_size".
> It should be: "If the data is present in current block
> lineInfo.line_size will be updated"

Fixed, changed it to lineinfo->line_size.

>
> (ii)
>
> >This is not possible because of pg_atomic_compare_exchange_u32, this
> >will succeed only for one of the workers whose line_state is
> >LINE_LEADER_POPULATED, for other workers it will fail. This is
> >explained in detail above ParallelCopyLineBoundary.
>
> Yes, but prior to that call to pg_atomic_compare_exchange_u32(),
> aren't you separately reading line_state and line_state, so that
> between those reads, it may have transitioned from leader to another
> worker, such that the read line state ("cur_line_state", being checked
> in the if block) may not actually match what is now in the line_state
> and/or the read line_size ("dataSize") doesn't actually correspond to
> the read line state?
>
> (sorry, still not 100% convinced that the synchronization and checks
> are safe in all cases)
>

I think that you are describing about the problem could happen in the
following case:
when we read curr_line_state, the value was LINE_WORKER_PROCESSED or
LINE_WORKER_PROCESSING. Then in some cases if the leader is very fast
compared to the workers then the leader quickly populates one line and
sets the state to LINE_LEADER_POPULATED. State is changed to
LINE_LEADER_POPULATED when we are checking the currr_line_state.
I feel this will not be a problem because, Leader will populate & wait
till some RING element is available to populate. In the meantime
worker has seen that state is LINE_WORKER_PROCESSED or
LINE_WORKER_PROCESSING(previous state that it read), worker has
identified that this chunk was processed by some other worker, worker
will move and try to get the next available chunk & insert those
records. It will keep continuing till it gets the next chunk to
process. Eventually one of the workers will get this chunk and process
it.

> (3) v3-0006-Parallel-Copy-For-Binary-Format-Files.patch
>
> >raw_buf is not used in parallel copy, instead raw_buf will be pointing
> >to shared memory data blocks. This memory was allocated as part of
> >BeginCopyFrom, uptil this point we cannot be 100% sure as copy can be
> >performed sequentially like in case max_worker_processes is not
> >available, if it switches to sequential mode raw_buf will be used
> >while performing copy operation. At this place we can safely free this
> >memory that was allocated
>
> So the following code (which checks raw_buf, which still points to
> memory that has been pfreed) is still valid?
>
>   In the SetRawBufForLoad() function, which is called by CopyReadLineText():
>
>     cur_data_blk_ptr = (cstate->raw_buf) ?
> &pcshared_info->data_blocks[cur_block_pos] : NULL;
>
> The above code looks a bit dicey to me. I stepped over that line in
> the debugger when I debugged an instance of Parallel Copy, so it
> definitely gets executed.
> It makes me wonder what other code could possibly be checking raw_buf
> and using it in some way, when in fact what it points to has been
> pfreed.
>
> Are you able to add the following line of code, or will it (somehow)
> break logic that you are relying on?
>
> pfree(cstate->raw_buf);
> cstate->raw_buf = NULL;               <=== I suggest that this line is added
>

You are right, I have debugged & verified it sets it to an invalid
block which is not expected. There are chances this would have caused
some corruption in some machines. The suggested fix is required, I
have fixed it. I have moved this change to
0003-Allow-copy-from-command-to-process-data-from-file.patch as
0006-Parallel-Copy-For-Binary-Format-Files is only for Binary format
parallel copy & that change is common change for parallel copy.

I have attached new set of patches with the fixes.
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Ashutosh Sharma
Дата: 19 августа 2020 г., 09:09:41
Сообщение: Re: recovering from "found xmin ... from before relfrozenxid ..."

Следующее

От: Fujii Masao
Дата: 19 августа 2020 г., 09:48:33
Сообщение: Re: Creating a function for exposing memory usage of backend process

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Parallel copy

Вложения

Предыдущее

Следующее