Re: Make COPY format extendable: Extract COPY TO format implementations

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: Make COPY format extendable: Extract COPY TO format implementations
Дата
Msg-id CAD21AoAkSzEbkUp8fg1pzCFDFiZNPWXM+=rNWXhgC8TYp88Uvg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Make COPY format extendable: Extract COPY TO format implementations  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
On Thu, Jan 25, 2024 at 1:53 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Jan 25, 2024 at 01:36:03PM +0900, Masahiko Sawada wrote:
> > Hmm I can see a similar trend that Suto-san had; the binary format got
> > slightly faster whereas both text and csv format has small regression
> > (4%~5%). I think that the improvement for binary came from the fact
> > that we removed "if (cstate->opts.binary)" branches from the original
> > CopyOneRowTo(). I've experimented with a similar optimization for csv
> > and text format; have different callbacks for text and csv format and
> > remove "if (cstate->opts.csv_mode)" branches. I've attached a patch
> > for that. Here are results:
> >
> > HEAD w/ 0001 patch + remove branches:
> > binary 2824.502 ms
> > text 2715.264 ms
> > csv 2803.381 ms
> >
> > The numbers look better now. I'm not sure these are within a noise
> > range but it might be worth considering having different callbacks for
> > text and csv formats.
>
> Interesting.
>
> Your numbers imply a 0.3% speedup for text, 0.7% speedup for csv and
> 0.9% speedup for binary, which may be around the noise range assuming
> a ~1% range.  While this does not imply a regression, that seems worth
> the duplication IMO.

Agreed. In addition to that, now that each format routine has its own
callbacks, there would be chances that we can do other optimizations
dedicated to the format type in the future if available.

>  The patch had better document the reason why the
> split is done, as well.

+1

>
> CopyFromTextOneRow() has also specific branches for binary and
> non-binary removed in 0005, so assuming that I/O is not a bottleneck,
> the operation would be faster because we would not evaluate this "if"
> condition for each row.  Wouldn't we also see improvements for COPY
> FROM with short row values, say when mounting PGDATA into a
> tmpfs/ramfs?

Probably. Seems worth evaluating.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Smith
Дата:
Сообщение: Re: Synchronizing slots from primary to standby
Следующее
От: "Shankaran, Akash"
Дата:
Сообщение: RE: Popcount optimization using AVX512