Re: Speed up COPY FROM text/CSV parsing using SIMD
| От | Nathan Bossart |
|---|---|
| Тема | Re: Speed up COPY FROM text/CSV parsing using SIMD |
| Дата | |
| Msg-id | aPfTiX0HwV42R6Od@nathan обсуждение исходный текст |
| Ответ на | Re: Speed up COPY FROM text/CSV parsing using SIMD (Nazir Bilal Yavuz <byavuz81@gmail.com>) |
| Ответы |
Re: Speed up COPY FROM text/CSV parsing using SIMD
|
| Список | pgsql-hackers |
On Tue, Oct 21, 2025 at 12:09:27AM +0300, Nazir Bilal Yavuz wrote: > I think the problem is deciding how many lines to process before > deciding for the rest. 1000 lines could work for the small sized data > but it might not work for the big sized data. Also, it might cause a > worse regressions for the small sized data. IMHO we have some leeway with smaller amounts of data. If COPY FROM for 1000 rows takes 19 milliseconds as opposed to 11 milliseconds, it seems unlikely users would be inconvenienced all that much. (Those numbers are completely made up in order to illustrate my point.) > Because of this reason, I > tried to implement a heuristic that will work regardless of the size > of the data. The last heuristic I suggested will run SIMD for > approximately (#number_of_lines / 1024 [1024 is the max number of > lines to sleep before running SIMD again]) lines if all characters in > the data are special characters. I wonder if we could mitigate the regression further by spacing out the checks a bit more. It could be worth comparing a variety of values to identify what works best with the test data. -- nathan
В списке pgsql-hackers по дате отправления: