speed up verifying UTF-8

Поиск

Список

Период

Сортировка

От	John Naylor
Тема	speed up verifying UTF-8
Дата	2 июня 2021 г. 16:26:41
Msg-id	CAFBsxsHii1-wbwN7vEbpzK03VJJL=EXegJSz6RSXbXZeaUB2jA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [POC] verifying UTF-8 using SIMD instructions (John Naylor <john.naylor@enterprisedb.com>)
Ответы	Re: speed up verifying UTF-8
Список	pgsql-hackers

Дерево обсуждения

For v10, I've split the patch up into two parts. 0001 uses pure C everywhere. This is much smaller and easier to review, and gets us the most bang for the buck.

One concern Heikki raised upthread is that platforms with poor unaligned-memory access will see a regression. We could easily add an #ifdef to take care of that, but I haven't done so here.

To recap: On ascii-only input with storage taken out of the picture, profiles of COPY FROM show a reduction from nealy 10% down to just over 1%. In microbenchmarks found earlier in this thread, this works out to about 7 times faster. On multibyte/mixed input, 0001 is a bit faster, but not really enough to make a difference in copy performance.

0002 adds the SSE4 implementation on x86-64, and is equally fast on all input, at the cost of greater complexity.

To reflect the split, I've changed the thread subject and the commitfest title.

John Naylor

EDB: http://www.enterprisedb.com

Вложения

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

speed up verifying UTF-8

Вложения