Re: speed up verifying UTF-8

Поиск

Список

Период

Сортировка

От	Heikki Linnakangas
Тема	Re: speed up verifying UTF-8
Дата	9 июня 2021 г. 11:02:02
Msg-id	e7729297-53e8-6e17-7334-7227043ce716@iki.fi обсуждение исходный текст
Ответ на	Re: speed up verifying UTF-8 (John Naylor <john.naylor@enterprisedb.com>)
Ответы	Re: speed up verifying UTF-8
Список	pgsql-hackers

Дерево обсуждения

On 07/06/2021 15:39, John Naylor wrote:
> On Mon, Jun 7, 2021 at 8:24 AM Heikki Linnakangas <hlinnaka@iki.fi 
> <mailto:hlinnaka@iki.fi>> wrote:
>  >
>  > On 03/06/2021 21:58, John Naylor wrote:
>  > > The microbenchmark is the same one you attached to [1], which I 
> extended
>  > > with a 95% multibyte case.
>  >
>  > Could you share the exact test you're using? I'd like to test this on my
>  > old raspberry pi, out of curiosity.
> 
> Sure, attached.
> 
> --
> John Naylor
> EDB: http://www.enterprisedb.com <http://www.enterprisedb.com>
> 
Results from chipmunk, my first generation Raspberry Pi:

Master:

  chinese | mixed | ascii
---------+-------+-------
    25392 | 16287 | 10295
(1 row)

v11-0001-Rewrite-pg_utf8_verifystr-for-speed.patch:

  chinese | mixed | ascii
---------+-------+-------
    17739 | 10854 |  4121
(1 row)

So that's good.

What is the worst case scenario for this algorithm? Something where the 
new fast ASCII check never helps, but is as fast as possible with the 
old code. For that, I added a repeating pattern of '123456789012345ä' to 
the test set (these results are from my Intel laptop, not the raspberry pi):

Master:

  chinese | mixed | ascii | mixed2
---------+-------+-------+--------
     1333 |   757 |   410 |    573
(1 row)

v11-0001-Rewrite-pg_utf8_verifystr-for-speed.patch:

  chinese | mixed | ascii | mixed2
---------+-------+-------+--------
      942 |   470 |    66 |   1249
(1 row)

So there's a regression with that input. Maybe that's acceptable, this 
is the worst case, after all. Or you could tweak check_ascii for a 
different performance tradeoff, by checking the two 64-bit words 
separately and returning "8" if the failure happens in the second word. 
And I haven't tried the SSE patch yet, maybe that compensates for this.

- Heikki

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: speed up verifying UTF-8