Re: Speed up COPY FROM text/CSV parsing using SIMD
| От | Nazir Bilal Yavuz |
|---|---|
| Тема | Re: Speed up COPY FROM text/CSV parsing using SIMD |
| Дата | |
| Msg-id | CAN55FZ0AYP4ZEczBJ5ur-=9QuEhMysH9Yfrq5srr0ZakK1M0FA@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: Speed up COPY FROM text/CSV parsing using SIMD (Nathan Bossart <nathandbossart@gmail.com>) |
| Ответы |
Re: Speed up COPY FROM text/CSV parsing using SIMD
|
| Список | pgsql-hackers |
Hi, On Tue, 21 Oct 2025 at 21:40, Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Tue, Oct 21, 2025 at 12:09:27AM +0300, Nazir Bilal Yavuz wrote: > > I think the problem is deciding how many lines to process before > > deciding for the rest. 1000 lines could work for the small sized data > > but it might not work for the big sized data. Also, it might cause a > > worse regressions for the small sized data. > > IMHO we have some leeway with smaller amounts of data. If COPY FROM for > 1000 rows takes 19 milliseconds as opposed to 11 milliseconds, it seems > unlikely users would be inconvenienced all that much. (Those numbers are > completely made up in order to illustrate my point.) > > > Because of this reason, I > > tried to implement a heuristic that will work regardless of the size > > of the data. The last heuristic I suggested will run SIMD for > > approximately (#number_of_lines / 1024 [1024 is the max number of > > lines to sleep before running SIMD again]) lines if all characters in > > the data are special characters. > > I wonder if we could mitigate the regression further by spacing out the > checks a bit more. It could be worth comparing a variety of values to > identify what works best with the test data. Do you mean that instead of doubling the SIMD sleep, we should multiply it by 3 (or another factor)? Or are you referring to increasing the maximum sleep from 1024? Or possibly both? -- Regards, Nazir Bilal Yavuz Microsoft
В списке pgsql-hackers по дате отправления: