Re: Parallel copy

Поиск
Список
Период
Сортировка
От Greg Nancarrow
Тема Re: Parallel copy
Дата
Msg-id CAJcOf-fwxtoyOSKrGjybZtsU9Qx1sjHaM6gObUJqOZYC_maHgQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Parallel copy  (vignesh C <vignesh21@gmail.com>)
Ответы Re: Parallel copy  (Amit Kapila <amit.kapila16@gmail.com>)
Re: Parallel copy  (vignesh C <vignesh21@gmail.com>)
Список pgsql-hackers
On Thu, Oct 8, 2020 at 5:44 AM vignesh C <vignesh21@gmail.com> wrote:

> Attached v6 patch with the fixes.
>

Hi Vignesh,

I noticed a couple of issues when scanning the code in the following patch:

    v6-0003-Allow-copy-from-command-to-process-data-from-file.patch

In the following code, it will put a junk uint16 value into *destptr
(and thus may well cause a crash) on a Big Endian architecture
(Solaris Sparc, s390x, etc.):
You're storing a (uint16) string length in a uint32 and then pulling
out the lower two bytes of the uint32 and copying them into the
location pointed to by destptr.


static void
+CopyStringToSharedMemory(CopyState cstate, char *srcPtr, char *destptr,
+ uint32 *copiedsize)
+{
+ uint32 len = srcPtr ? strlen(srcPtr) + 1 : 0;
+
+ memcpy(destptr, (uint16 *) &len, sizeof(uint16));
+ *copiedsize += sizeof(uint16);
+ if (len)
+ {
+ memcpy(destptr + sizeof(uint16), srcPtr, len);
+ *copiedsize += len;
+ }
+}

I suggest you change the code to:

    uint16 len = srcPtr ? (uint16)strlen(srcPtr) + 1 : 0;
    memcpy(destptr, &len, sizeof(uint16));

[I assume string length here can't ever exceed (65535 - 1), right?]

Looking a bit deeper into this, I'm wondering if in fact your
EstimateStringSize() and EstimateNodeSize() functions should be using
BUFFERALIGN() for EACH stored string/node (rather than just calling
shm_toc_estimate_chunk() once at the end, after the length of packed
strings and nodes has been estimated), to ensure alignment of start of
each string/node. Other Postgres code appears to be aligning each
stored chunk using shm_toc_estimate_chunk(). See the definition of
that macro and its current usages.

Then you could safely use:

    uint16 len = srcPtr ? (uint16)strlen(srcPtr) + 1 : 0;
    *(uint16 *)destptr = len;
    *copiedsize += sizeof(uint16);
    if (len)
    {
        memcpy(destptr + sizeof(uint16), srcPtr, len);
        *copiedsize += len;
    }

and in the CopyStringFromSharedMemory() function, then could safely use:

    len = *(uint16 *)srcPtr;

The compiler may be smart enough to optimize-away the memcpy() in this
case anyway, but there are issues in doing this for architectures that
take a performance hit for unaligned access, or don't support
unaligned access.

Also, in CopyXXXXFromSharedMemory() functions, you should use palloc()
instead of palloc0(), as you're filling the entire palloc'd buffer
anyway, so no need to ask for additional MemSet() of all buffer bytes
to 0 prior to memcpy().


Regards,
Greg Nancarrow
Fujitsu Australia



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "tsunakawa.takay@fujitsu.com"
Дата:
Сообщение: RE: [Patch] Optimize dropping of relation buffers using dlist
Следующее
От: "Hou, Zhijie"
Дата:
Сообщение: RE: [PATCH] Keeps tracking the uniqueness with UniqueKey