Re: [PATCH] Performance Improvement For Copy From Binary Files

Поиск
Список
Период
Сортировка
От Amit Langote
Тема Re: [PATCH] Performance Improvement For Copy From Binary Files
Дата
Msg-id CA+HiwqHz=xE2HHYARyem7SDnnNVOV9_wRD5SfEgbSk60Ad7C1g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Performance Improvement For Copy From Binary Files  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Sun, Jul 26, 2020 at 6:06 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Langote <amitlangote09@gmail.com> writes:
> > [ v7-0001-Improve-performance-of-binary-COPY-FROM-with-buff.patch ]
>
> Pushed with cosmetic changes.

Thanks for that.

> I'd always supposed that stdio does enough internal buffering that short
> fread()s shouldn't be much worse than memcpy().  But I reproduced your
> result of ~30% speedup for data with a lot of narrow text columns, using
> RHEL 8.2.  Thinking this an indictment of glibc, I also tried it on
> current macOS, and saw an even bigger speedup, approaching 50%.  So
> there's definitely something to this.  I wonder if we ought to check
> other I/O-constrained users of fread and fwrite, like pg_dump/pg_restore.

Ah, maybe a good idea to check that.

> A point that I did not see addressed in the thread is whether this
> has any negative impact on the copy-from-frontend code path, where
> there's no fread() to avoid; short reads from CopyGetData() are
> already going to be satisfied by memcpy'ing from the fe_msgbuf.
> However, a quick check suggests that this patch is still a small
> win for that case too --- apparently the control overhead in
> CopyGetData() is not negligible.

Indeed.

> So the patch seems fine functionally, but there were some cosmetic
> things I didn't like:
>
> * Removing CopyGetInt32 and CopyGetInt16 seemed like a pretty bad
> idea, because it made the callers much uglier and more error-prone.
> This is a particularly bad example:
>
>                 /* Header extension length */
> -               if (!CopyGetInt32(cstate, &tmp) ||
> -                       tmp < 0)
> +               if (CopyReadBinaryData(cstate, (char *) &tmp, sizeof(tmp)) !=
> +                       sizeof(tmp) || (tmp = (int32) pg_ntoh32(tmp)) < 0)
>
> Putting side-effects into late stages of an if-condition is just
> awful coding practice.  They're easy for a reader to miss and they
> are magnets for bugs, because of the possibility that control doesn't
> reach that part of the condition.
>
> You can get the exact same speedup without any of those disadvantages
> by marking these two functions "inline", so that's what I did.
>
> * I dropped the DRAIN_COPY_RAW_BUF macro too, as in my estimation it was
> a net negative for readability.  With only two use-cases, having it made
> the code longer not shorter; I was also pretty unconvinced about the
> wisdom of having some of the loop's control logic inside the macro and
> some outside.
>
> * BTW, the macro definitions weren't particularly per project style
> anyway.  We generally put at least one space before line-ending
> backslashes.  I don't think pgindent will fix this for you; IME
> it doesn't touch macro definitions at all.
>
> * Did some more work on the comments.

Thanks for these changes.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: handle a ECPG_bytea typo
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: Fast DSM segments