Re: Parallel copy

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Parallel copy
Дата
Msg-id CAA4eK1LkGaad0Ucf8rR1MvxJfDD86od1s=FvWk4X6E2UeAq3Vg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Parallel copy  (Greg Nancarrow <gregn4422@gmail.com>)
Ответы Re: Parallel copy  (Greg Nancarrow <gregn4422@gmail.com>)
Re: Parallel copy  (vignesh C <vignesh21@gmail.com>)
Список pgsql-hackers
On Thu, Oct 8, 2020 at 8:43 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Thu, Oct 8, 2020 at 5:44 AM vignesh C <vignesh21@gmail.com> wrote:
>
> > Attached v6 patch with the fixes.
> >
>
> Hi Vignesh,
>
> I noticed a couple of issues when scanning the code in the following patch:
>
>     v6-0003-Allow-copy-from-command-to-process-data-from-file.patch
>
> In the following code, it will put a junk uint16 value into *destptr
> (and thus may well cause a crash) on a Big Endian architecture
> (Solaris Sparc, s390x, etc.):
> You're storing a (uint16) string length in a uint32 and then pulling
> out the lower two bytes of the uint32 and copying them into the
> location pointed to by destptr.
>
>
> static void
> +CopyStringToSharedMemory(CopyState cstate, char *srcPtr, char *destptr,
> + uint32 *copiedsize)
> +{
> + uint32 len = srcPtr ? strlen(srcPtr) + 1 : 0;
> +
> + memcpy(destptr, (uint16 *) &len, sizeof(uint16));
> + *copiedsize += sizeof(uint16);
> + if (len)
> + {
> + memcpy(destptr + sizeof(uint16), srcPtr, len);
> + *copiedsize += len;
> + }
> +}
>
> I suggest you change the code to:
>
>     uint16 len = srcPtr ? (uint16)strlen(srcPtr) + 1 : 0;
>     memcpy(destptr, &len, sizeof(uint16));
>
> [I assume string length here can't ever exceed (65535 - 1), right?]
>

Your suggestion makes sense to me if the assumption related to string
length is correct. If we can't ensure that then we need to probably
use four bytes uint32 to store the length.

> Looking a bit deeper into this, I'm wondering if in fact your
> EstimateStringSize() and EstimateNodeSize() functions should be using
> BUFFERALIGN() for EACH stored string/node (rather than just calling
> shm_toc_estimate_chunk() once at the end, after the length of packed
> strings and nodes has been estimated), to ensure alignment of start of
> each string/node. Other Postgres code appears to be aligning each
> stored chunk using shm_toc_estimate_chunk(). See the definition of
> that macro and its current usages.
>

I am not sure if this required for the purpose of correctness. AFAIU,
we do store/estimate multiple parameters in same way at other places,
see EstimateParamListSpace and SerializeParamList. Do you have
something else in mind?

While looking at the latest code, I observed below issue in patch
v6-0003-Allow-copy-from-command-to-process-data-from-file:

+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_CSTATE */
+ est_cstateshared = MAXALIGN(sizeof(SerializedParallelCopyState));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_cstateshared);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ strsize = EstimateCstateSize(pcxt, cstate, attnamelist, &whereClauseStr,
+ &rangeTableStr, &attnameListStr,
+ ¬nullListStr, &nullListStr,
+ &convertListStr);

Here, do we need to separately estimate the size of
SerializedParallelCopyState when it is also done in
EstimateCstateSize?

-- 
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Deng, Gang"
Дата:
Сообщение: RE: [PoC] Non-volatile WAL buffer
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: range_agg