Обсуждение: Speed up COPY TO text/CSV parsing using SIMD

Поиск
Список
Период
Сортировка

Speed up COPY TO text/CSV parsing using SIMD

От
KAZAR Ayoub
Дата:
Hello,
Following Nazir's recommendation to move this to a different thread so it can be looked at separately.

On Thu, Jan 8, 2026 at 2:49 PM Manni Wood <manni.wood@enterprisedb.com> wrote:
On Wed, 24 Dec 2025 at 18:08, KAZAR Ayoub <ma_kazar@esi.dz> wrote:
>
> Hello,
> Following the same path of optimizing COPY FROM using SIMD, i found that COPY TO can also benefit from this.
>
> I attached a small patch that uses SIMD to skip data and advance as far as the first special character is found, then fallback to scalar processing for that character and re-enter the SIMD path again...
> There's two ways to do this:
> 1) Essentially we do SIMD until we find a special character, then continue scalar path without re-entering SIMD again.
> - This gives from 10% to 30% speedups depending on the weight of special characters in the attribute, we don't lose anything here since it advances with SIMD until it can't (using the previous scripts: 1/3, 2/3 specials chars).
>
> 2) Do SIMD path, then use scalar path when we hit a special character, keep re-entering the SIMD path each time.
> - This is equivalent to the COPY FROM story, we'll need to find the same heuristic to use for both COPY FROM/TO to reduce the regressions (same regressions: around from 20% to 30% with 1/3, 2/3 specials chars).
>
> Something else to note is that the scalar path for COPY TO isn't as heavy as the state machine in COPY FROM.
>
> So if we find the sweet spot for the heuristic, doing the same for COPY TO will be trivial and always beneficial.
> Attached is 0004 which is option 1 (SIMD without re-entering), 0005 is the second one.

Ayoub Kazar, I tested your v4 "copy to" patch, doing everything in RAM, and using the cpupower tips from above. (I wanted to test your v5, but `git apply --check` gave me an error, so I can look at that another day.)

The results look great:

master: (forgot to get commit hash)

text, no special: 8165
text, 1/3 special: 22662
csv, no special: 9619
csv, 1/3 special: 23213

v4 (copy to)

text, no special: 4577 (43.9% speedup)
text, 1/3 special: 22847 (0.8% regression)
csv, no special: 4720 (50.9% speedup)
csv, 1/3 special: 23195 (0.07% regression)

Seems like a very clear win to me!
-- Manni Wood EDB: https://www.enterprisedb.com

Currently optimizing COPY FROM using SIMD is still under review, but for the case of COPY TO using the same ideas, we found that the problem is trivial, the attached patch gives very nice speedups as confirmed by Manni's benchmarks.


Regards,
Ayoub
Вложения

Re: Speed up COPY TO text/CSV parsing using SIMD

От
Andres Freund
Дата:
Hi,

On 2026-02-12 22:07:52 +0100, KAZAR Ayoub wrote:
> Currently optimizing COPY FROM using SIMD is still under review, but for
> the case of COPY TO using the same ideas, we found that the problem is
> trivial, the attached patch gives very nice speedups as confirmed by
> Manni's benchmarks.

I have a hard time believing that adding a strlen() to the handling of a short
column won't be a measurable overhead with lots of short attributes.
Particularly because the patch afaict will call it repeatedly if there are any
to-be-escaped characters.

I also don't think it's good how much code this repeats. I think you'd have to
start with preparatory moving the exiting code into static inline helper
functions and then introduce SIMD into those.

Greetings,

Andres Freund



Re: Speed up COPY TO text/CSV parsing using SIMD

От
KAZAR Ayoub
Дата:
Hi,

On Thu, Feb 12, 2026 at 10:25 PM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2026-02-12 22:07:52 +0100, KAZAR Ayoub wrote:
> Currently optimizing COPY FROM using SIMD is still under review, but for
> the case of COPY TO using the same ideas, we found that the problem is
> trivial, the attached patch gives very nice speedups as confirmed by
> Manni's benchmarks.

I have a hard time believing that adding a strlen() to the handling of a short
column won't be a measurable overhead with lots of short attributes.
Particularly because the patch afaict will call it repeatedly if there are any
to-be-escaped characters.
Thanks for pointing that out, so here's what i did:
1) In the previous patch, strlen was called twice if a CSV attribute needed to add a quote, the attached patch gets the length in the beginning and uses it for both SIMD paths, so basically one call.
2) If an attribute needs encoding we need to recalculate string length because it can grow. (so 2 calls at maximum in all cases)
3) Supposing the very worse cases, i benchmarked this against master for tables that have 100, 500, 1000 columns : all integers only, so one would want to process the whole thing in just a pass rather than calculating length of such short attributes:
1000 columns:
TEXT: 17% regression
CSV: 3.4% regression

500 columns:
TEXT: 17.7% regression
CSV: 3.1% regression

100 columns: 
TEXT: 17.3% regression
CSV: 3% regression

A bit unstable results, but yeah the overhead for worse cases like this is really significant, I can't argue whether this is worth it or not, so thoughts on this ?

I also don't think it's good how much code this repeats. I think you'd have to
start with preparatory moving the exiting code into static inline helper
functions and then introduce SIMD into those.
Done, yet i'm not too sure whether this is the right place to put it, let me know.


Regards,
Ayoub
Вложения

Re: Speed up COPY TO text/CSV parsing using SIMD

От
Nathan Bossart
Дата:
On Sat, Feb 14, 2026 at 04:02:21PM +0100, KAZAR Ayoub wrote:
> On Thu, Feb 12, 2026 at 10:25 PM Andres Freund <andres@anarazel.de> wrote:
>> I have a hard time believing that adding a strlen() to the handling of a
>> short column won't be a measurable overhead with lots of short attributes.
>> Particularly because the patch afaict will call it repeatedly if there are
>> any to-be-escaped characters.
> 
> [...]
> 
> 1000 columns:
> TEXT: 17% regression
> CSV: 3.4% regression
> 
> 500 columns:
> TEXT: 17.7% regression
> CSV: 3.1% regression
> 
> 100 columns:
> TEXT: 17.3% regression
> CSV: 3% regression
> 
> A bit unstable results, but yeah the overhead for worse cases like this is
> really significant, I can't argue whether this is worth it or not, so
> thoughts on this ?

I seriously doubt we'd commit something that produces a 17% regression
here.  Perhaps we should skip the SIMD paths whenever transcoding is
required.

-- 
nathan



Re: Speed up COPY TO text/CSV parsing using SIMD

От
KAZAR Ayoub
Дата:
Hello,
On Tue, Mar 10, 2026 at 8:17 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Sat, Feb 14, 2026 at 04:02:21PM +0100, KAZAR Ayoub wrote:
> On Thu, Feb 12, 2026 at 10:25 PM Andres Freund <andres@anarazel.de> wrote:
>> I have a hard time believing that adding a strlen() to the handling of a
>> short column won't be a measurable overhead with lots of short attributes.
>> Particularly because the patch afaict will call it repeatedly if there are
>> any to-be-escaped characters.
>
> [...]
>
> 1000 columns:
> TEXT: 17% regression
> CSV: 3.4% regression
>
> 500 columns:
> TEXT: 17.7% regression
> CSV: 3.1% regression
>
> 100 columns:
> TEXT: 17.3% regression
> CSV: 3% regression
>
> A bit unstable results, but yeah the overhead for worse cases like this is
> really significant, I can't argue whether this is worth it or not, so
> thoughts on this ?

I seriously doubt we'd commit something that produces a 17% regression
here.  Perhaps we should skip the SIMD paths whenever transcoding is
required.

--
nathan
I've spent some time rethinking about this and here's what i've done in v3:
SIMD is only used for varlena attributes whose text representation is longer than a single SIMD vector, and only when no transcoding is required.  
Fixed-size types such as integers etc.. mostly produce short ASCII output for which SIMD provides no benefit.

For eligible attributes, the stored varlena size is used as a cheap pre-filter to avoid an
unnecessary strlen() call on short values.

Here are the benchmark results after many runs compared to master (4deecb52aff):
TEXT clean: -34.0%
CSV clean: -39.3%
TEXT 1/3: +4.7%
CSV 1/3: -2.3%
the above numbers have a variance of 1% to 3% improvs or regressions across +20 runs

WIDE tables short attributes TEXT: 
50 columns: -3.7% 
100 columns: -1.7% 
200 columns: +1.8% 
500 columns: -0.5% 
1000 columns: -0.3%

WIDE tables short attributes CSV: 
50 columns: -2.5%
100 columns: +1.8%
200 columns: +1.4% 
500 columns: -0.9% 
1000 columns: -1.1%

Wide tables benchmarks where all similar noise, across +20 runs its always around -2% and +4% for all numbers of columns.

Just a small concern about where some varlenas have a larger binary size than its text representation ex: 
SELECT pg_column_size(to_tsvector('SIMD is GOOD'));
 pg_column_size
----------------
             32

its text representation is less than sizeof(Vector8) so currently v3 would enter SIMD path and exit out just from the beginning (two extra branches)
because it does this:
+ if (TupleDescAttr(tup_desc, attnum - 1)->attlen == -1 &&
+ VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8))

I thought maybe we could do * 2 or * 4 its binary size, depends on the type really but this is just a proposition if this case is something concerning.

Thoughts?


Regards,
Ayoub
Вложения

Re: Speed up COPY TO text/CSV parsing using SIMD

От
Nathan Bossart
Дата:
On Sat, Mar 14, 2026 at 11:43:38PM +0100, KAZAR Ayoub wrote:
> Just a small concern about where some varlenas have a larger binary size
> than its text representation ex:
> SELECT pg_column_size(to_tsvector('SIMD is GOOD'));
>  pg_column_size
> ----------------
>              32
> 
> its text representation is less than sizeof(Vector8) so currently v3 would
> enter SIMD path and exit out just from the beginning (two extra branches)
> because it does this:
> + if (TupleDescAttr(tup_desc, attnum - 1)->attlen == -1 &&
> + VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8))
> 
> I thought maybe we could do * 2 or * 4 its binary size, depends on the type
> really but this is just a proposition if this case is something concerning.

Can we measure the impact of this?  How likely is this case?

> +static pg_attribute_always_inline void CopyAttributeOutText(CopyToState cstate, const char *string,
> +                                                            bool use_simd, size_t len);
> +static pg_attribute_always_inline void CopyAttributeOutCSV(CopyToState cstate, const char *string,
> +                                                           bool use_quote, bool use_simd, size_t len);

Can you test this on its own, too?  We might be able to separate this and
the change below into a prerequisite patch, assuming they show benefits.

>              if (is_csv)
> -                CopyAttributeOutCSV(cstate, string,
> -                                    cstate->opts.force_quote_flags[attnum - 1]);
> +            {
> +                if (use_simd)
> +                    CopyAttributeOutCSV(cstate, string,
> +                                        cstate->opts.force_quote_flags[attnum - 1],
> +                                        true, len);
> +                else
> +                    CopyAttributeOutCSV(cstate, string,
> +                                        cstate->opts.force_quote_flags[attnum - 1],
> +                                        false, len);
> +            }
>              else
> -                CopyAttributeOutText(cstate, string);
> +            {
> +                if (use_simd)
> +                    CopyAttributeOutText(cstate, string, true, len);
> +                else
> +                    CopyAttributeOutText(cstate, string, false, len);
> +            }

There isn't a terrible amount of branching on use_simd in these functions,
so I'm a little skeptical this makes much difference.  As above, it would
be good to measure it.

-- 
nathan



Re: Speed up COPY TO text/CSV parsing using SIMD

От
KAZAR Ayoub
Дата:
On Tue, Mar 17, 2026 at 7:49 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Sat, Mar 14, 2026 at 11:43:38PM +0100, KAZAR Ayoub wrote:
> Just a small concern about where some varlenas have a larger binary size
> than its text representation ex:
> SELECT pg_column_size(to_tsvector('SIMD is GOOD'));
>  pg_column_size
> ----------------
>              32
>
> its text representation is less than sizeof(Vector8) so currently v3 would
> enter SIMD path and exit out just from the beginning (two extra branches)
> because it does this:
> + if (TupleDescAttr(tup_desc, attnum - 1)->attlen == -1 &&
> + VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8))
>
> I thought maybe we could do * 2 or * 4 its binary size, depends on the type
> really but this is just a proposition if this case is something concerning.

Can we measure the impact of this?  How likely is this case?
I'll respond to this separately in a different email.

> +static pg_attribute_always_inline void CopyAttributeOutText(CopyToState cstate, const char *string,
> +                                                                                                                     bool use_simd, size_t len);
> +static pg_attribute_always_inline void CopyAttributeOutCSV(CopyToState cstate, const char *string,
> +                                                                                                                bool use_quote, bool use_simd, size_t len);

Can you test this on its own, too?  We might be able to separate this and
the change below into a prerequisite patch, assuming they show benefits.
I tested inlining alone and found the results were about an improvement of 1% to 4% across all configurations.
The inlining is only meaningful in combination with the SIMD work, for the reason described below. 

>                       if (is_csv)
> -                             CopyAttributeOutCSV(cstate, string,
> -                                                                     cstate->opts.force_quote_flags[attnum - 1]);
> +                     {
> +                             if (use_simd)
> +                                     CopyAttributeOutCSV(cstate, string,
> +                                                                             cstate->opts.force_quote_flags[attnum - 1],
> +                                                                             true, len);
> +                             else
> +                                     CopyAttributeOutCSV(cstate, string,
> +                                                                             cstate->opts.force_quote_flags[attnum - 1],
> +                                                                             false, len);

There isn't a terrible amount of branching on use_simd in these functions,
so I'm a little skeptical this makes much difference.  As above, it would
be good to measure it
I compiled three variants

v3: use_simd passed as compile-time, CopyAttribute functions inlined.
v3_variable: use_simd as is variable, CopyAttribute functions inlined.
v3_variable_noinline: use_simd as is variable, CopyAttribute functions are not inlined.

None of the helpers are explicitly inlined by us.

The assembly reveals two things:
1) The CSV SIMD helpers (CopyCheckCSVQuoteNeedSIMD, CopySkipCSVEscapeSIMD) are inlined by the compiler naturally in all
three variants, CopySkipTextSIMD is never inlined by the compiler in any variant.

2) The constant-emitting approach (v3) does matter (just a little apparently) specifically for CopySkipTextSIMD.  
Its the same story as COPY FROM patch's first commit it just emits code without use_simd branch
     jbe  ...   ; len > sizeof(Vector8)
     je   ...   ; need_transcoding
     call CopySkipTextSIMD

Whether the extra branching in for constant passing is worth it or not is demonstrated by the benchmark.


  Test                 Master    v3       v3_var   v3_var_noinl
  TEXT clean           1504ms   -24.1%   -23.0%   -21.5%
  CSV clean            1760ms   -34.9%   -32.7%   -33.0%
  TEXT 1/3 backslashes     3763ms    +4.6%    +6.9%   +4.1%
  CSV 1/3 quotes           3885ms    +3.1%    +2.7%    -0.8%

Wide table TEXT (integer columns):

  Cols    Master    v3       v3_var   v3_var_noinl
  50      2083ms   -0.7%    -0.6%    +3.5%
  100     4094ms   -0.1%    -0.5%    +4.5%
  200     1560ms   +0.6%    -2.3%    +3.2%
  500     1905ms   -1.0%    -1.3%    +4.7%
  1000    1455ms   +1.8%    +0.4%    +4.3%

Wide table CSV:

  Cols    Master    v3       v3_var   v3_var_noinl
  50      2421ms   +4.0%    +6.7%    +5.8%
  100     4980ms   +0.1%    +2.0%     +0.1%
  200     1901ms   +1.4%    +3.5%    +1.4%
  500     2328ms   +1.8%    +2.7%    +2.2%
  1000    1815ms   +2.0%    +2.8%    +2.5%

I'm not sure whether there's a diff between v3 and v3_var practically speaking, what do you think ?


Regards,
Ayoub

Re: Speed up COPY TO text/CSV parsing using SIMD

От
KAZAR Ayoub
Дата:
On Wed, Mar 18, 2026 at 12:02 AM KAZAR Ayoub <ma_kazar@esi.dz> wrote:
On Tue, Mar 17, 2026 at 7:49 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Sat, Mar 14, 2026 at 11:43:38PM +0100, KAZAR Ayoub wrote:
> Just a small concern about where some varlenas have a larger binary size
> than its text representation ex:
> SELECT pg_column_size(to_tsvector('SIMD is GOOD'));
>  pg_column_size
> ----------------
>              32
>
> its text representation is less than sizeof(Vector8) so currently v3 would
> enter SIMD path and exit out just from the beginning (two extra branches)
> because it does this:
> + if (TupleDescAttr(tup_desc, attnum - 1)->attlen == -1 &&
> + VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8))
>
> I thought maybe we could do * 2 or * 4 its binary size, depends on the type
> really but this is just a proposition if this case is something concerning.

Can we measure the impact of this?  How likely is this case?
I'll respond to this separately in a different email.
My example was already incorrect (the text representation is lexems and positions, not the text we read as it is, its lossy), anyways the point still holds.
If we have some json(b) column like : {"key1":"val1","key2":"val2"}, for CSV format this would immediately exit the SIMD path because of quote character, for json(b) this is going to be always the case.
I measured the overhead of exiting the SIMD path a lot (8 million times for one COPY TO command), i only found 3% regression for this case, sometimes 2%.

For cases where we do a false commitment on SIMD because we read a binary size >= sizeof(Vector8), which i found very niche too, the short circuit to scalar each time is even more negligible (the above CSV JSON case is the absolute worst case).
So I don't think any of this should be a concern.


Regards,
Ayoub

Re: Speed up COPY TO text/CSV parsing using SIMD

От
KAZAR Ayoub
Дата:
On Wed, Mar 18, 2026 at 3:29 AM KAZAR Ayoub <ma_kazar@esi.dz> wrote:
On Wed, Mar 18, 2026 at 12:02 AM KAZAR Ayoub <ma_kazar@esi.dz> wrote:
On Tue, Mar 17, 2026 at 7:49 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Sat, Mar 14, 2026 at 11:43:38PM +0100, KAZAR Ayoub wrote:
> Just a small concern about where some varlenas have a larger binary size
> than its text representation ex:
> SELECT pg_column_size(to_tsvector('SIMD is GOOD'));
>  pg_column_size
> ----------------
>              32
>
> its text representation is less than sizeof(Vector8) so currently v3 would
> enter SIMD path and exit out just from the beginning (two extra branches)
> because it does this:
> + if (TupleDescAttr(tup_desc, attnum - 1)->attlen == -1 &&
> + VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8))
>
> I thought maybe we could do * 2 or * 4 its binary size, depends on the type
> really but this is just a proposition if this case is something concerning.

Can we measure the impact of this?  How likely is this case?
I'll respond to this separately in a different email.
My example was already incorrect (the text representation is lexems and positions, not the text we read as it is, its lossy), anyways the point still holds.
If we have some json(b) column like : {"key1":"val1","key2":"val2"}, for CSV format this would immediately exit the SIMD path because of quote character, for json(b) this is going to be always the case.
I measured the overhead of exiting the SIMD path a lot (8 million times for one COPY TO command), i only found 3% regression for this case, sometimes 2%.

For cases where we do a false commitment on SIMD because we read a binary size >= sizeof(Vector8), which i found very niche too, the short circuit to scalar each time is even more negligible (the above CSV JSON case is the absolute worst case).
So I don't think any of this should be a concern.


Regards,
Ayoub
Rebased patch.

Regards,
Ayoub
Вложения

Re: Speed up COPY TO text/CSV parsing using SIMD

От
Nathan Bossart
Дата:
On Wed, Mar 18, 2026 at 12:02:28AM +0100, KAZAR Ayoub wrote:
>   Test                 Master    v3       v3_var   v3_var_noinl
>   TEXT clean           1504ms   -24.1%   -23.0%   -21.5%
>   CSV clean            1760ms   -34.9%   -32.7%   -33.0%

Nice!

>   TEXT 1/3 backslashes     3763ms    +4.6%    +6.9%   +4.1%
>   CSV 1/3 quotes           3885ms    +3.1%    +2.7%    -0.8%

Hm.  These seem a little bit beyond what we could ignore as noise.

> Wide table TEXT (integer columns):
> 
>   Cols    Master    v3       v3_var   v3_var_noinl
>   50      2083ms   -0.7%    -0.6%    +3.5%
>   100     4094ms   -0.1%    -0.5%    +4.5%
>   200     1560ms   +0.6%    -2.3%    +3.2%
>   500     1905ms   -1.0%    -1.3%    +4.7%
>   1000    1455ms   +1.8%    +0.4%    +4.3%

These numbers look roughly within the noise range.

> Wide table CSV:
> 
>   Cols    Master    v3       v3_var   v3_var_noinl
>   50      2421ms   +4.0%    +6.7%    +5.8%

Hm.  Is this reproducible?  A 4% regression is a bit worrisome.

>   100     4980ms   +0.1%    +2.0%     +0.1%
>   200     1901ms   +1.4%    +3.5%    +1.4%
>   500     2328ms   +1.8%    +2.7%    +2.2%
>   1000    1815ms   +2.0%    +2.8%    +2.5%

These numbers don't bother me too much, but maybe there are some ways to
minimize the regressions further.

-- 
nathan



Re: Speed up COPY TO text/CSV parsing using SIMD

От
Nathan Bossart
Дата:
On Wed, Mar 18, 2026 at 03:29:32AM +0100, KAZAR Ayoub wrote:
> If we have some json(b) column like : {"key1":"val1","key2":"val2"}, for
> CSV format this would immediately exit the SIMD path because of quote
> character, for json(b) this is going to be always the case.
> I measured the overhead of exiting the SIMD path a lot (8 million times for
> one COPY TO command), i only found 3% regression for this case, sometimes
> 2%.

I'm a little worried that we might be dismissing small-yet-measurable
regressions for extremely common workloads.  Unlike the COPY FROM work,
this operates on a per-attribute level, meaning we only use SIMD when an
attribute is at least 16 bytes.  The extra branching for each attribute
might not be something we can just ignore.

> For cases where we do a false commitment on SIMD because we read a binary
> size >= sizeof(Vector8), which i found very niche too, the short circuit to
> scalar each time is even more negligible (the above CSV JSON case is the
> absolute worst case).

That's good to hear.

-- 
nathan



Re: Speed up COPY TO text/CSV parsing using SIMD

От
KAZAR Ayoub
Дата:
Hello,
On Thu, Mar 26, 2026 at 10:23 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Wed, Mar 18, 2026 at 03:29:32AM +0100, KAZAR Ayoub wrote:
> If we have some json(b) column like : {"key1":"val1","key2":"val2"}, for
> CSV format this would immediately exit the SIMD path because of quote
> character, for json(b) this is going to be always the case.
> I measured the overhead of exiting the SIMD path a lot (8 million times for
> one COPY TO command), i only found 3% regression for this case, sometimes
> 2%.

I'm a little worried that we might be dismissing small-yet-measurable
regressions for extremely common workloads.  Unlike the COPY FROM work,
this operates on a per-attribute level, meaning we only use SIMD when an
attribute is at least 16 bytes.  The extra branching for each attribute
might not be something we can just ignore.
Thanks for the review.
 
I added a prescan loop inside the simd helpers trying to catch special chars in sizeof(Vector8) characters, i measured how good is this at reducing the overhead of starting simd and exiting at first vector:
the scalar loop is better than SIMD for one vector if it finds a special character before 6th character, worst case is not a clean vector, where the scalar loop needs 20 more cycles compared to SIMD.
This helps mitigate the case of JSON(B) in CSV format, this is why I only added this for CSV case only.

In a benchmark with 10M early SIMD exit like the JSONB case, the previous 3% regression is gone.

For the normal benchmark (clean, 1/3 specials, wide table), i ran for longer times for v4 now and i found this:
  Test                       Master    V4
  TEXT clean                 1619ms    -28.0%
  CSV clean                  1866ms    -37.1%
  TEXT 1/3 backslashes       3913ms    +1.2%
  CSV 1/3 quotes             4012ms    -3.0%

Wide table TEXT:

  Cols    Master    V4
  50      2109ms    -2.9%
  100     2029ms    -1.6%
  200     3982ms    -2.9%
  500     1962ms    -6.1%
  1000    3812ms    -3.6%

Wide table CSV:

  Cols    Master    V4
  50      2531ms    +0.3%
  100     2465ms    +1.1%
  200     4965ms    -0.2%
  500     2346ms    +1.4%
  1000    4709ms    -0.4%

Do we need more benchmarks for some other kind of workloads ? If i'm missing something else that has noticeable overhead maybe ?

Regards,
Ayoub
Вложения

Re: Speed up COPY TO text/CSV parsing using SIMD

От
Nathan Bossart
Дата:
On Fri, Mar 27, 2026 at 07:48:38PM +0100, KAZAR Ayoub wrote:
> I added a prescan loop inside the simd helpers trying to catch special
> chars in sizeof(Vector8) characters, i measured how good is this at
> reducing the overhead of starting simd and exiting at first vector:
> the scalar loop is better than SIMD for one vector if it finds a special
> character before 6th character, worst case is not a clean vector, where the
> scalar loop needs 20 more cycles compared to SIMD.
> This helps mitigate the case of JSON(B) in CSV format, this is why I only
> added this for CSV case only.

Interesting.

> In a benchmark with 10M early SIMD exit like the JSONB case, the previous
> 3% regression is gone.

While these are nice results, I think it's best that we target v20 for this
patch so that we have more time to benchmark and explore edge cases.

-- 
nathan



Re: Speed up COPY TO text/CSV parsing using SIMD

От
KAZAR Ayoub
Дата:
On Tue, Mar 31, 2026 at 6:30 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Fri, Mar 27, 2026 at 07:48:38PM +0100, KAZAR Ayoub wrote:
> I added a prescan loop inside the simd helpers trying to catch special
> chars in sizeof(Vector8) characters, i measured how good is this at
> reducing the overhead of starting simd and exiting at first vector:
> the scalar loop is better than SIMD for one vector if it finds a special
> character before 6th character, worst case is not a clean vector, where the
> scalar loop needs 20 more cycles compared to SIMD.
> This helps mitigate the case of JSON(B) in CSV format, this is why I only
> added this for CSV case only.

Interesting.

> In a benchmark with 10M early SIMD exit like the JSONB case, the previous
> 3% regression is gone.

While these are nice results, I think it's best that we target v20 for this
patch so that we have more time to benchmark and explore edge cases.
Thanks for the review.
Fair enough, I'll try many more cases in the upcoming weeks to make sure we're not missing anything.

--
nathan
Regards,
Ayoub