Re: speed up verifying UTF-8

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: speed up verifying UTF-8
Дата
Msg-id e7729297-53e8-6e17-7334-7227043ce716@iki.fi
обсуждение исходный текст
Ответ на Re: speed up verifying UTF-8  (John Naylor <john.naylor@enterprisedb.com>)
Ответы Re: speed up verifying UTF-8  (John Naylor <john.naylor@enterprisedb.com>)
Список pgsql-hackers
On 07/06/2021 15:39, John Naylor wrote:
> On Mon, Jun 7, 2021 at 8:24 AM Heikki Linnakangas <hlinnaka@iki.fi 
> <mailto:hlinnaka@iki.fi>> wrote:
>  >
>  > On 03/06/2021 21:58, John Naylor wrote:
>  > > The microbenchmark is the same one you attached to [1], which I 
> extended
>  > > with a 95% multibyte case.
>  >
>  > Could you share the exact test you're using? I'd like to test this on my
>  > old raspberry pi, out of curiosity.
> 
> Sure, attached.
> 
> --
> John Naylor
> EDB: http://www.enterprisedb.com <http://www.enterprisedb.com>
> 
Results from chipmunk, my first generation Raspberry Pi:

Master:

  chinese | mixed | ascii
---------+-------+-------
    25392 | 16287 | 10295
(1 row)

v11-0001-Rewrite-pg_utf8_verifystr-for-speed.patch:

  chinese | mixed | ascii
---------+-------+-------
    17739 | 10854 |  4121
(1 row)

So that's good.

What is the worst case scenario for this algorithm? Something where the 
new fast ASCII check never helps, but is as fast as possible with the 
old code. For that, I added a repeating pattern of '123456789012345ä' to 
the test set (these results are from my Intel laptop, not the raspberry pi):

Master:

  chinese | mixed | ascii | mixed2
---------+-------+-------+--------
     1333 |   757 |   410 |    573
(1 row)

v11-0001-Rewrite-pg_utf8_verifystr-for-speed.patch:

  chinese | mixed | ascii | mixed2
---------+-------+-------+--------
      942 |   470 |    66 |   1249
(1 row)

So there's a regression with that input. Maybe that's acceptable, this 
is the worst case, after all. Or you could tweak check_ascii for a 
different performance tradeoff, by checking the two 64-bit words 
separately and returning "8" if the failure happens in the second word. 
And I haven't tried the SSE patch yet, maybe that compensates for this.

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Decoding speculative insert with toast leaks memory
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: postgres_fdw batching vs. (re)creating the tuple slots