Re: WIP Incremental JSON Parser

Поиск
Список
Период
Сортировка
От Andrew Dunstan
Тема Re: WIP Incremental JSON Parser
Дата
Msg-id 126d90b3-140e-5b99-eefb-b670c493f2d0@dunslane.net
обсуждение исходный текст
Ответ на Re: WIP Incremental JSON Parser  (Jacob Champion <champion.p@gmail.com>)
Ответы Re: WIP Incremental JSON Parser  (Peter Smith <smithpb2250@gmail.com>)
Список pgsql-hackers
On 2024-01-09 Tu 13:46, Jacob Champion wrote:
> On Tue, Dec 26, 2023 at 8:49 AM Andrew Dunstan <andrew@dunslane.net> wrote:
>> Quite a long time ago Robert asked me about the possibility of an
>> incremental JSON parser. I wrote one, and I've tweaked it a bit, but the
>> performance is significantly worse that that of the current Recursive
>> Descent parser.
> The prediction stack is neat. It seems like the main loop is hit so
> many thousands of times that micro-optimization would be necessary...
> I attached a sample diff to get rid of the strlen calls during
> push_prediction(), which speeds things up a bit (8-15%, depending on
> optimization level) on my machines.
>
> Maybe it's possible to condense some of those productions down, and
> reduce the loop count? E.g. does every "scalar" production need to go
> three times through the loop/stack, or can the scalar semantic action
> just peek at the next token prediction and do all the callback work at
> once?
>
>> +               case JSON_SEM_SCALAR_CALL:
>> +                   {
>> +                       json_scalar_action sfunc = sem->scalar;
>> +
>> +                       if (sfunc != NULL)
>> +                           (*sfunc) (sem->semstate, scalar_val, scalar_tok);
>> +                   }
> Is it safe to store state (scalar_val/scalar_tok) on the stack, or
> does it disappear if the parser hits an incomplete token?
>
>> One possible use would be in parsing large manifest files for
>> incremental backup.
> I'm keeping an eye on this thread for OAuth, since the clients have to
> parse JSON as well. Those responses tend to be smaller, though, so
> you'd have to really be hurting for resources to need this.
>

I've incorporated your suggestion, and fixed the bug you identified.


The attached also adds support for incrementally parsing backup 
manifests, and uses that in the three places we call the manifest parser.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: minor replication slot docs edits
Следующее
От: Kirk Wolak
Дата:
Сообщение: Re: Oom on temp (un-analyzed table caused by JIT) V16.1