Re: speed up verifying UTF-8

Поиск

Список

Период

Сортировка

От	John Naylor
Тема	Re: speed up verifying UTF-8
Дата	28 июля 2021 г. 18:12:11
Msg-id	CAFBsxsH=jfWgo7-ToygfdjnC60C3V_N=6=EoCfQ50U3cED_W8g@mail.gmail.com обсуждение исходный текст
Ответ на	Re: speed up verifying UTF-8 (John Naylor <john.naylor@enterprisedb.com>)
Список	pgsql-hackers

Дерево обсуждения

I wrote:

> On Mon, Jul 26, 2021 at 7:55 AM Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:
> >
> > >+ utf8_advance(s, state, len);
> > >+
> > >+ /*
> > >+ * If we saw an error during the loop, let the caller handle it. We treat
> > >+ * all other states as success.
> > >+ */
> > >+ if (state == ERR)
> > >+ return 0;
> >
> > Did you mean state = utf8_advance(s, state, len); there? (reassign state variable)
>
> Yep, that's a bug, thanks for catching!

Fixed in v21, with a regression test added. Also, utf8_advance() now directly changes state by a passed pointer rather than returning a value. Some cosmetic changes:

s/valid_bytes/non_error_bytes/ since the former is kind of misleading now.

Some other var name and symbol changes. In my first DFA experiment, ASC conflicted with the parser or scanner somehow, but it doesn't here, so it's clearer to use this.

Rewrote a lot of comments about the state machine and regression tests.
--
John Naylor
EDB: http://www.enterprisedb.com

Вложения

v21-0001-Add-fast-paths-for-validating-UTF-8-text.patch

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: speed up verifying UTF-8

Вложения