Обсуждение: Speed up COPY TO text/CSV parsing using SIMD
Hello,
Following Nazir's recommendation to move this to a different thread so it can be looked at separately.
On Thu, Jan 8, 2026 at 2:49 PM Manni Wood <manni.wood@enterprisedb.com> wrote:
On Thu, Jan 8, 2026 at 2:49 PM Manni Wood <manni.wood@enterprisedb.com> wrote:
On Wed, 24 Dec 2025 at 18:08, KAZAR Ayoub <ma_kazar@esi.dz> wrote:
>
> Hello,
> Following the same path of optimizing COPY FROM using SIMD, i found that COPY TO can also benefit from this.
>
> I attached a small patch that uses SIMD to skip data and advance as far as the first special character is found, then fallback to scalar processing for that character and re-enter the SIMD path again...
> There's two ways to do this:
> 1) Essentially we do SIMD until we find a special character, then continue scalar path without re-entering SIMD again.
> - This gives from 10% to 30% speedups depending on the weight of special characters in the attribute, we don't lose anything here since it advances with SIMD until it can't (using the previous scripts: 1/3, 2/3 specials chars).
>
> 2) Do SIMD path, then use scalar path when we hit a special character, keep re-entering the SIMD path each time.
> - This is equivalent to the COPY FROM story, we'll need to find the same heuristic to use for both COPY FROM/TO to reduce the regressions (same regressions: around from 20% to 30% with 1/3, 2/3 specials chars).
>
> Something else to note is that the scalar path for COPY TO isn't as heavy as the state machine in COPY FROM.
>
> So if we find the sweet spot for the heuristic, doing the same for COPY TO will be trivial and always beneficial.
> Attached is 0004 which is option 1 (SIMD without re-entering), 0005 is the second one.Ayoub Kazar, I tested your v4 "copy to" patch, doing everything in RAM, and using the cpupower tips from above. (I wanted to test your v5, but `git apply --check` gave me an error, so I can look at that another day.)The results look great:master: (forgot to get commit hash)
text, no special: 8165
text, 1/3 special: 22662
csv, no special: 9619
csv, 1/3 special: 23213
v4 (copy to)
text, no special: 4577 (43.9% speedup)
text, 1/3 special: 22847 (0.8% regression)
csv, no special: 4720 (50.9% speedup)
csv, 1/3 special: 23195 (0.07% regression)Seems like a very clear win to me!-- Manni Wood EDB: https://www.enterprisedb.com
Currently optimizing COPY FROM using SIMD is still under review, but for the case of COPY TO using the same ideas, we found that the problem is trivial, the attached patch gives very nice speedups as confirmed by Manni's benchmarks.
Regards,
Ayoub
Вложения
Hi, On 2026-02-12 22:07:52 +0100, KAZAR Ayoub wrote: > Currently optimizing COPY FROM using SIMD is still under review, but for > the case of COPY TO using the same ideas, we found that the problem is > trivial, the attached patch gives very nice speedups as confirmed by > Manni's benchmarks. I have a hard time believing that adding a strlen() to the handling of a short column won't be a measurable overhead with lots of short attributes. Particularly because the patch afaict will call it repeatedly if there are any to-be-escaped characters. I also don't think it's good how much code this repeats. I think you'd have to start with preparatory moving the exiting code into static inline helper functions and then introduce SIMD into those. Greetings, Andres Freund
Hi,
On Thu, Feb 12, 2026 at 10:25 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2026-02-12 22:07:52 +0100, KAZAR Ayoub wrote:
> Currently optimizing COPY FROM using SIMD is still under review, but for
> the case of COPY TO using the same ideas, we found that the problem is
> trivial, the attached patch gives very nice speedups as confirmed by
> Manni's benchmarks.
I have a hard time believing that adding a strlen() to the handling of a short
column won't be a measurable overhead with lots of short attributes.
Particularly because the patch afaict will call it repeatedly if there are any
to-be-escaped characters.
Thanks for pointing that out, so here's what i did:
1) In the previous patch, strlen was called twice if a CSV attribute needed to add a quote, the attached patch gets the length in the beginning and uses it for both SIMD paths, so basically one call.
2) If an attribute needs encoding we need to recalculate string length because it can grow. (so 2 calls at maximum in all cases)
3) Supposing the very worse cases, i benchmarked this against master for tables that have 100, 500, 1000 columns : all integers only, so one would want to process the whole thing in just a pass rather than calculating length of such short attributes:
1000 columns:
TEXT: 17% regression
CSV: 3.4% regression
500 columns:
TEXT: 17.7% regression
CSV: 3.1% regression
100 columns:
TEXT: 17.3% regression
TEXT: 17.3% regression
CSV: 3% regression
A bit unstable results, but yeah the overhead for worse cases like this is really significant, I can't argue whether this is worth it or not, so thoughts on this ?
I also don't think it's good how much code this repeats. I think you'd have to
start with preparatory moving the exiting code into static inline helper
functions and then introduce SIMD into those.
Done, yet i'm not too sure whether this is the right place to put it, let me know.
Regards,
Ayoub
Вложения
On Sat, Feb 14, 2026 at 04:02:21PM +0100, KAZAR Ayoub wrote: > On Thu, Feb 12, 2026 at 10:25 PM Andres Freund <andres@anarazel.de> wrote: >> I have a hard time believing that adding a strlen() to the handling of a >> short column won't be a measurable overhead with lots of short attributes. >> Particularly because the patch afaict will call it repeatedly if there are >> any to-be-escaped characters. > > [...] > > 1000 columns: > TEXT: 17% regression > CSV: 3.4% regression > > 500 columns: > TEXT: 17.7% regression > CSV: 3.1% regression > > 100 columns: > TEXT: 17.3% regression > CSV: 3% regression > > A bit unstable results, but yeah the overhead for worse cases like this is > really significant, I can't argue whether this is worth it or not, so > thoughts on this ? I seriously doubt we'd commit something that produces a 17% regression here. Perhaps we should skip the SIMD paths whenever transcoding is required. -- nathan
Hello,
On Tue, Mar 10, 2026 at 8:17 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Sat, Feb 14, 2026 at 04:02:21PM +0100, KAZAR Ayoub wrote:
> On Thu, Feb 12, 2026 at 10:25 PM Andres Freund <andres@anarazel.de> wrote:
>> I have a hard time believing that adding a strlen() to the handling of a
>> short column won't be a measurable overhead with lots of short attributes.
>> Particularly because the patch afaict will call it repeatedly if there are
>> any to-be-escaped characters.
>
> [...]
>
> 1000 columns:
> TEXT: 17% regression
> CSV: 3.4% regression
>
> 500 columns:
> TEXT: 17.7% regression
> CSV: 3.1% regression
>
> 100 columns:
> TEXT: 17.3% regression
> CSV: 3% regression
>
> A bit unstable results, but yeah the overhead for worse cases like this is
> really significant, I can't argue whether this is worth it or not, so
> thoughts on this ?
I seriously doubt we'd commit something that produces a 17% regression
here. Perhaps we should skip the SIMD paths whenever transcoding is
required.
--
nathan
I've spent some time rethinking about this and here's what i've done in v3:
SIMD is only used for varlena attributes whose text representation is longer than a single SIMD vector, and only when no transcoding is required.
Fixed-size types such as integers etc.. mostly produce short ASCII output for which SIMD provides no benefit.
For eligible attributes, the stored varlena size is used as a cheap pre-filter to avoid an
unnecessary strlen() call on short values.
For eligible attributes, the stored varlena size is used as a cheap pre-filter to avoid an
unnecessary strlen() call on short values.
Here are the benchmark results after many runs compared to master (4deecb52aff):
TEXT clean: -34.0%
CSV clean: -39.3%
TEXT 1/3: +4.7%
CSV 1/3: -2.3%
the above numbers have a variance of 1% to 3% improvs or regressions across +20 runs
WIDE tables short attributes TEXT:
50 columns: -3.7%
100 columns: -1.7%
200 columns: +1.8%
500 columns: -0.5%
1000 columns: -0.3%
WIDE tables short attributes CSV:
50 columns: -2.5%
100 columns: +1.8%
200 columns: +1.4%
500 columns: -0.9%
1000 columns: -1.1%
Wide tables benchmarks where all similar noise, across +20 runs its always around -2% and +4% for all numbers of columns.
Just a small concern about where some varlenas have a larger binary size than its text representation ex:
SELECT pg_column_size(to_tsvector('SIMD is GOOD'));
pg_column_size
----------------
32
SELECT pg_column_size(to_tsvector('SIMD is GOOD'));
pg_column_size
----------------
32
its text representation is less than sizeof(Vector8) so currently v3 would enter SIMD path and exit out just from the beginning (two extra branches)
because it does this:
+ if (TupleDescAttr(tup_desc, attnum - 1)->attlen == -1 &&
+ VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8))
+ VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8))
I thought maybe we could do * 2 or * 4 its binary size, depends on the type really but this is just a proposition if this case is something concerning.
Thoughts?
Regards,
Ayoub
Вложения
On Sat, Mar 14, 2026 at 11:43:38PM +0100, KAZAR Ayoub wrote:
> Just a small concern about where some varlenas have a larger binary size
> than its text representation ex:
> SELECT pg_column_size(to_tsvector('SIMD is GOOD'));
> pg_column_size
> ----------------
> 32
>
> its text representation is less than sizeof(Vector8) so currently v3 would
> enter SIMD path and exit out just from the beginning (two extra branches)
> because it does this:
> + if (TupleDescAttr(tup_desc, attnum - 1)->attlen == -1 &&
> + VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8))
>
> I thought maybe we could do * 2 or * 4 its binary size, depends on the type
> really but this is just a proposition if this case is something concerning.
Can we measure the impact of this? How likely is this case?
> +static pg_attribute_always_inline void CopyAttributeOutText(CopyToState cstate, const char *string,
> + bool use_simd, size_t len);
> +static pg_attribute_always_inline void CopyAttributeOutCSV(CopyToState cstate, const char *string,
> + bool use_quote, bool use_simd, size_t len);
Can you test this on its own, too? We might be able to separate this and
the change below into a prerequisite patch, assuming they show benefits.
> if (is_csv)
> - CopyAttributeOutCSV(cstate, string,
> - cstate->opts.force_quote_flags[attnum - 1]);
> + {
> + if (use_simd)
> + CopyAttributeOutCSV(cstate, string,
> + cstate->opts.force_quote_flags[attnum - 1],
> + true, len);
> + else
> + CopyAttributeOutCSV(cstate, string,
> + cstate->opts.force_quote_flags[attnum - 1],
> + false, len);
> + }
> else
> - CopyAttributeOutText(cstate, string);
> + {
> + if (use_simd)
> + CopyAttributeOutText(cstate, string, true, len);
> + else
> + CopyAttributeOutText(cstate, string, false, len);
> + }
There isn't a terrible amount of branching on use_simd in these functions,
so I'm a little skeptical this makes much difference. As above, it would
be good to measure it.
--
nathan
On Tue, Mar 17, 2026 at 7:49 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Sat, Mar 14, 2026 at 11:43:38PM +0100, KAZAR Ayoub wrote:
> Just a small concern about where some varlenas have a larger binary size
> than its text representation ex:
> SELECT pg_column_size(to_tsvector('SIMD is GOOD'));
> pg_column_size
> ----------------
> 32
>
> its text representation is less than sizeof(Vector8) so currently v3 would
> enter SIMD path and exit out just from the beginning (two extra branches)
> because it does this:
> + if (TupleDescAttr(tup_desc, attnum - 1)->attlen == -1 &&
> + VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8))
>
> I thought maybe we could do * 2 or * 4 its binary size, depends on the type
> really but this is just a proposition if this case is something concerning.
Can we measure the impact of this? How likely is this case?
I'll respond to this separately in a different email.
> +static pg_attribute_always_inline void CopyAttributeOutText(CopyToState cstate, const char *string,
> + bool use_simd, size_t len);
> +static pg_attribute_always_inline void CopyAttributeOutCSV(CopyToState cstate, const char *string,
> + bool use_quote, bool use_simd, size_t len);
Can you test this on its own, too? We might be able to separate this and
the change below into a prerequisite patch, assuming they show benefits.
I tested inlining alone and found the results were about an improvement of 1% to 4% across all configurations.
The inlining is only meaningful in combination with the SIMD work, for the reason described below.
> if (is_csv)
> - CopyAttributeOutCSV(cstate, string,
> - cstate->opts.force_quote_flags[attnum - 1]);
> + {
> + if (use_simd)
> + CopyAttributeOutCSV(cstate, string,
> + cstate->opts.force_quote_flags[attnum - 1],
> + true, len);
> + else
> + CopyAttributeOutCSV(cstate, string,
> + cstate->opts.force_quote_flags[attnum - 1],
> + false, len);
There isn't a terrible amount of branching on use_simd in these functions,
so I'm a little skeptical this makes much difference. As above, it would
be good to measure it
I compiled three variants
v3: use_simd passed as compile-time, CopyAttribute functions inlined.
v3_variable: use_simd as is variable, CopyAttribute functions inlined.
v3_variable_noinline: use_simd as is variable, CopyAttribute functions are not inlined.
v3: use_simd passed as compile-time, CopyAttribute functions inlined.
v3_variable: use_simd as is variable, CopyAttribute functions inlined.
v3_variable_noinline: use_simd as is variable, CopyAttribute functions are not inlined.
None of the helpers are explicitly inlined by us.
The assembly reveals two things:
1) The CSV SIMD helpers (CopyCheckCSVQuoteNeedSIMD, CopySkipCSVEscapeSIMD) are inlined by the compiler naturally in all
three variants, CopySkipTextSIMD is never inlined by the compiler in any variant.
2) The constant-emitting approach (v3) does matter (just a little apparently) specifically for CopySkipTextSIMD.
2) The constant-emitting approach (v3) does matter (just a little apparently) specifically for CopySkipTextSIMD.
Its the same story as COPY FROM patch's first commit it just emits code without use_simd branch
jbe ... ; len > sizeof(Vector8)
je ... ; need_transcoding
call CopySkipTextSIMD
jbe ... ; len > sizeof(Vector8)
je ... ; need_transcoding
call CopySkipTextSIMD
Whether the extra branching in for constant passing is worth it or not is demonstrated by the benchmark.
Test Master v3 v3_var v3_var_noinl
TEXT clean 1504ms -24.1% -23.0% -21.5%
CSV clean 1760ms -34.9% -32.7% -33.0%
TEXT 1/3 backslashes 3763ms +4.6% +6.9% +4.1%
CSV 1/3 quotes 3885ms +3.1% +2.7% -0.8%
Wide table TEXT (integer columns):
Cols Master v3 v3_var v3_var_noinl
50 2083ms -0.7% -0.6% +3.5%
100 4094ms -0.1% -0.5% +4.5%
200 1560ms +0.6% -2.3% +3.2%
500 1905ms -1.0% -1.3% +4.7%
1000 1455ms +1.8% +0.4% +4.3%
Wide table CSV:
Cols Master v3 v3_var v3_var_noinl
50 2421ms +4.0% +6.7% +5.8%
100 4980ms +0.1% +2.0% +0.1%
200 1901ms +1.4% +3.5% +1.4%
500 2328ms +1.8% +2.7% +2.2%
1000 1815ms +2.0% +2.8% +2.5%
I'm not sure whether there's a diff between v3 and v3_var practically speaking, what do you think ?
Regards,
Ayoub
On Wed, Mar 18, 2026 at 12:02 AM KAZAR Ayoub <ma_kazar@esi.dz> wrote:
On Tue, Mar 17, 2026 at 7:49 PM Nathan Bossart <nathandbossart@gmail.com> wrote:On Sat, Mar 14, 2026 at 11:43:38PM +0100, KAZAR Ayoub wrote:
> Just a small concern about where some varlenas have a larger binary size
> than its text representation ex:
> SELECT pg_column_size(to_tsvector('SIMD is GOOD'));
> pg_column_size
> ----------------
> 32
>
> its text representation is less than sizeof(Vector8) so currently v3 would
> enter SIMD path and exit out just from the beginning (two extra branches)
> because it does this:
> + if (TupleDescAttr(tup_desc, attnum - 1)->attlen == -1 &&
> + VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8))
>
> I thought maybe we could do * 2 or * 4 its binary size, depends on the type
> really but this is just a proposition if this case is something concerning.
Can we measure the impact of this? How likely is this case?I'll respond to this separately in a different email.
My example was already incorrect (the text representation is lexems and positions, not the text we read as it is, its lossy), anyways the point still holds.
If we have some json(b) column like : {"key1":"val1","key2":"val2"}, for CSV format this would immediately exit the SIMD path because of quote character, for json(b) this is going to be always the case.
I measured the overhead of exiting the SIMD path a lot (8 million times for one COPY TO command), i only found 3% regression for this case, sometimes 2%.
For cases where we do a false commitment on SIMD because we read a binary size >= sizeof(Vector8), which i found very niche too, the short circuit to scalar each time is even more negligible (the above CSV JSON case is the absolute worst case).
So I don't think any of this should be a concern.
Regards,
Ayoub
On Wed, Mar 18, 2026 at 3:29 AM KAZAR Ayoub <ma_kazar@esi.dz> wrote:
On Wed, Mar 18, 2026 at 12:02 AM KAZAR Ayoub <ma_kazar@esi.dz> wrote:On Tue, Mar 17, 2026 at 7:49 PM Nathan Bossart <nathandbossart@gmail.com> wrote:On Sat, Mar 14, 2026 at 11:43:38PM +0100, KAZAR Ayoub wrote:
> Just a small concern about where some varlenas have a larger binary size
> than its text representation ex:
> SELECT pg_column_size(to_tsvector('SIMD is GOOD'));
> pg_column_size
> ----------------
> 32
>
> its text representation is less than sizeof(Vector8) so currently v3 would
> enter SIMD path and exit out just from the beginning (two extra branches)
> because it does this:
> + if (TupleDescAttr(tup_desc, attnum - 1)->attlen == -1 &&
> + VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8))
>
> I thought maybe we could do * 2 or * 4 its binary size, depends on the type
> really but this is just a proposition if this case is something concerning.
Can we measure the impact of this? How likely is this case?I'll respond to this separately in a different email.My example was already incorrect (the text representation is lexems and positions, not the text we read as it is, its lossy), anyways the point still holds.If we have some json(b) column like : {"key1":"val1","key2":"val2"}, for CSV format this would immediately exit the SIMD path because of quote character, for json(b) this is going to be always the case.I measured the overhead of exiting the SIMD path a lot (8 million times for one COPY TO command), i only found 3% regression for this case, sometimes 2%.For cases where we do a false commitment on SIMD because we read a binary size >= sizeof(Vector8), which i found very niche too, the short circuit to scalar each time is even more negligible (the above CSV JSON case is the absolute worst case).So I don't think any of this should be a concern.Regards,Ayoub
Rebased patch.
Regards,
Ayoub
Вложения
On Wed, Mar 18, 2026 at 12:02:28AM +0100, KAZAR Ayoub wrote: > Test Master v3 v3_var v3_var_noinl > TEXT clean 1504ms -24.1% -23.0% -21.5% > CSV clean 1760ms -34.9% -32.7% -33.0% Nice! > TEXT 1/3 backslashes 3763ms +4.6% +6.9% +4.1% > CSV 1/3 quotes 3885ms +3.1% +2.7% -0.8% Hm. These seem a little bit beyond what we could ignore as noise. > Wide table TEXT (integer columns): > > Cols Master v3 v3_var v3_var_noinl > 50 2083ms -0.7% -0.6% +3.5% > 100 4094ms -0.1% -0.5% +4.5% > 200 1560ms +0.6% -2.3% +3.2% > 500 1905ms -1.0% -1.3% +4.7% > 1000 1455ms +1.8% +0.4% +4.3% These numbers look roughly within the noise range. > Wide table CSV: > > Cols Master v3 v3_var v3_var_noinl > 50 2421ms +4.0% +6.7% +5.8% Hm. Is this reproducible? A 4% regression is a bit worrisome. > 100 4980ms +0.1% +2.0% +0.1% > 200 1901ms +1.4% +3.5% +1.4% > 500 2328ms +1.8% +2.7% +2.2% > 1000 1815ms +2.0% +2.8% +2.5% These numbers don't bother me too much, but maybe there are some ways to minimize the regressions further. -- nathan
On Wed, Mar 18, 2026 at 03:29:32AM +0100, KAZAR Ayoub wrote:
> If we have some json(b) column like : {"key1":"val1","key2":"val2"}, for
> CSV format this would immediately exit the SIMD path because of quote
> character, for json(b) this is going to be always the case.
> I measured the overhead of exiting the SIMD path a lot (8 million times for
> one COPY TO command), i only found 3% regression for this case, sometimes
> 2%.
I'm a little worried that we might be dismissing small-yet-measurable
regressions for extremely common workloads. Unlike the COPY FROM work,
this operates on a per-attribute level, meaning we only use SIMD when an
attribute is at least 16 bytes. The extra branching for each attribute
might not be something we can just ignore.
> For cases where we do a false commitment on SIMD because we read a binary
> size >= sizeof(Vector8), which i found very niche too, the short circuit to
> scalar each time is even more negligible (the above CSV JSON case is the
> absolute worst case).
That's good to hear.
--
nathan
Hello,
On Thu, Mar 26, 2026 at 10:23 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Wed, Mar 18, 2026 at 03:29:32AM +0100, KAZAR Ayoub wrote:
> If we have some json(b) column like : {"key1":"val1","key2":"val2"}, for
> CSV format this would immediately exit the SIMD path because of quote
> character, for json(b) this is going to be always the case.
> I measured the overhead of exiting the SIMD path a lot (8 million times for
> one COPY TO command), i only found 3% regression for this case, sometimes
> 2%.
I'm a little worried that we might be dismissing small-yet-measurable
regressions for extremely common workloads. Unlike the COPY FROM work,
this operates on a per-attribute level, meaning we only use SIMD when an
attribute is at least 16 bytes. The extra branching for each attribute
might not be something we can just ignore.
Thanks for the review.
I added a prescan loop inside the simd helpers trying to catch special chars in sizeof(Vector8) characters, i measured how good is this at reducing the overhead of starting simd and exiting at first vector:
the scalar loop is better than SIMD for one vector if it finds a special character before 6th character, worst case is not a clean vector, where the scalar loop needs 20 more cycles compared to SIMD.
This helps mitigate the case of JSON(B) in CSV format, this is why I only added this for CSV case only.
In a benchmark with 10M early SIMD exit like the JSONB case, the previous 3% regression is gone.
For the normal benchmark (clean, 1/3 specials, wide table), i ran for longer times for v4 now and i found this:
Test Master V4
TEXT clean 1619ms -28.0%
CSV clean 1866ms -37.1%
TEXT 1/3 backslashes 3913ms +1.2%
CSV 1/3 quotes 4012ms -3.0%
Wide table TEXT:
Cols Master V4
50 2109ms -2.9%
100 2029ms -1.6%
200 3982ms -2.9%
500 1962ms -6.1%
1000 3812ms -3.6%
Wide table CSV:
Cols Master V4
50 2531ms +0.3%
100 2465ms +1.1%
200 4965ms -0.2%
500 2346ms +1.4%
1000 4709ms -0.4%
Test Master V4
TEXT clean 1619ms -28.0%
CSV clean 1866ms -37.1%
TEXT 1/3 backslashes 3913ms +1.2%
CSV 1/3 quotes 4012ms -3.0%
Wide table TEXT:
Cols Master V4
50 2109ms -2.9%
100 2029ms -1.6%
200 3982ms -2.9%
500 1962ms -6.1%
1000 3812ms -3.6%
Wide table CSV:
Cols Master V4
50 2531ms +0.3%
100 2465ms +1.1%
200 4965ms -0.2%
500 2346ms +1.4%
1000 4709ms -0.4%
Do we need more benchmarks for some other kind of workloads ? If i'm missing something else that has noticeable overhead maybe ?
Regards,
Ayoub
Вложения
On Fri, Mar 27, 2026 at 07:48:38PM +0100, KAZAR Ayoub wrote: > I added a prescan loop inside the simd helpers trying to catch special > chars in sizeof(Vector8) characters, i measured how good is this at > reducing the overhead of starting simd and exiting at first vector: > the scalar loop is better than SIMD for one vector if it finds a special > character before 6th character, worst case is not a clean vector, where the > scalar loop needs 20 more cycles compared to SIMD. > This helps mitigate the case of JSON(B) in CSV format, this is why I only > added this for CSV case only. Interesting. > In a benchmark with 10M early SIMD exit like the JSONB case, the previous > 3% regression is gone. While these are nice results, I think it's best that we target v20 for this patch so that we have more time to benchmark and explore edge cases. -- nathan
On Tue, Mar 31, 2026 at 6:30 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Fri, Mar 27, 2026 at 07:48:38PM +0100, KAZAR Ayoub wrote:
> I added a prescan loop inside the simd helpers trying to catch special
> chars in sizeof(Vector8) characters, i measured how good is this at
> reducing the overhead of starting simd and exiting at first vector:
> the scalar loop is better than SIMD for one vector if it finds a special
> character before 6th character, worst case is not a clean vector, where the
> scalar loop needs 20 more cycles compared to SIMD.
> This helps mitigate the case of JSON(B) in CSV format, this is why I only
> added this for CSV case only.
Interesting.
> In a benchmark with 10M early SIMD exit like the JSONB case, the previous
> 3% regression is gone.
While these are nice results, I think it's best that we target v20 for this
patch so that we have more time to benchmark and explore edge cases.
Thanks for the review.
Fair enough, I'll try many more cases in the upcoming weeks to make sure we're not missing anything.
--
nathan
Regards,
Ayoub