Re: Speed up COPY FROM text/CSV parsing using SIMD

Поиск

Список

Период

Сортировка

От	Nathan Bossart
Тема	Re: Speed up COPY FROM text/CSV parsing using SIMD
Дата	21 октября 21:40:09
Msg-id	aPfTiX0HwV42R6Od@nathan обсуждение исходный текст
Ответ на	Re: Speed up COPY FROM text/CSV parsing using SIMD (Nazir Bilal Yavuz <byavuz81@gmail.com>)
Ответы	Re: Speed up COPY FROM text/CSV parsing using SIMD
Список	pgsql-hackers

Дерево обсуждения

On Tue, Oct 21, 2025 at 12:09:27AM +0300, Nazir Bilal Yavuz wrote:
> I think the problem is deciding how many lines to process before
> deciding for the rest. 1000 lines could work for the small sized data
> but it might not work for the big sized data. Also, it might cause a
> worse regressions for the small sized data.

IMHO we have some leeway with smaller amounts of data.  If COPY FROM for
1000 rows takes 19 milliseconds as opposed to 11 milliseconds, it seems
unlikely users would be inconvenienced all that much.  (Those numbers are
completely made up in order to illustrate my point.)

> Because of this reason, I
> tried to implement a heuristic that will work regardless of the size
> of the data. The last heuristic I suggested will run SIMD for
> approximately (#number_of_lines / 1024 [1024 is the max number of
> lines to sleep before running SIMD again]) lines if all characters in
> the data are special characters.

I wonder if we could mitigate the regression further by spacing out the
checks a bit more.  It could be worth comparing a variety of values to
identify what works best with the test data.

-- 
nathan

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Speed up COPY FROM text/CSV parsing using SIMD