Re: Speed up COPY FROM text/CSV parsing using SIMD
| От | Manni Wood |
|---|---|
| Тема | Re: Speed up COPY FROM text/CSV parsing using SIMD |
| Дата | |
| Msg-id | CAKWEB6r1a3yacHx8bAM3qfpUps4=rm+uUC7JxBfx3P5J0r_SdA@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: Speed up COPY FROM text/CSV parsing using SIMD (KAZAR Ayoub <ma_kazar@esi.dz>) |
| Список | pgsql-hackers |
On Wed, Feb 4, 2026 at 8:29 AM KAZAR Ayoub <ma_kazar@esi.dz> wrote:
Hello,On Wed, Feb 4, 2026, 6:38 AM Manni Wood <manni.wood@enterprisedb.com> wrote:The 0001-COPY-from-SIMD-v3-with-line_buf-periodic-refill.patch seems nice! On My x86 PC, it had the usual performance improvment of earlier patches, but the regression seemed more similar for both text and csv inputs. Unfortunately, the regression is about 2.5%, but maybe that is an acceptable worst-case for an improvement of 22% for text inputs and 33% for CSV inputs?The 0001-COPY-from-SIMD-v3-with-line_buf-periodic-refill.patch looks even better on my Raspberry Pi's arm processor: not only do we see a 22% improvement for text and an almost 34% improvement for CSV, even the worst-case scenarios show an almost 4% improvement for text and an 11.7% improvement for CSV.By comparison, the v5.1-0001-Simple-heuristic-for-SIMD-COPY-FROM.patch.patch's worst-case performance is poorer on both architectures.I'd be curious to know if anyone else can reproduces these numbers. 0001-COPY-from-SIMD-v3-with-line_buf-periodic-refill.patch seems like a real winner.Thanks for the benchmark Manni, i suppose this is the same threshold as patch has (4096 bytes), have you tried any bigger values for the threshold ?Because i'm still expecting less l1d cache misses and execution times the more we increase the threshold (relatively to l1d cache size per core).As per my previous not-so-stable numbers 28KB wasn't too bad.Regards,Ayoub
Ah, thanks for the prod, Ayoub. You are correct: The results in my previous e-mail for the 0001-COPY-from-SIMD-v3-with-line_buf-periodic-refill.patch patch are with LINE_BUF_FLUSH_AFTER set to its default of 4096. I will try to measure what happens for larger LINE_BUF_FLUSH_AFTER values, hopefully some time this week.
Best,
-Manni
-- -- Manni Wood EDB: https://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: