Re: WIP Incremental JSON Parser

Поиск
Список
Период
Сортировка
От Andrew Dunstan
Тема Re: WIP Incremental JSON Parser
Дата
Msg-id 8447d168-b981-2601-8ad0-53827fe18e5a@dunslane.net
обсуждение исходный текст
Ответ на Re: WIP Incremental JSON Parser  (Jacob Champion <jacob.champion@enterprisedb.com>)
Ответы Re: WIP Incremental JSON Parser
Список pgsql-hackers
On 2024-04-02 Tu 15:38, Jacob Champion wrote:
> On Mon, Apr 1, 2024 at 4:53 PM Andrew Dunstan <andrew@dunslane.net> wrote:
>> Anyway, here are new patches. I've rolled the new semantic test into the
>> first patch.
> Looks good! I've marked RfC.


Thanks! I appreciate all the work you've done on this. I will give it 
one more pass and commit RSN.

>
>> json_lex() is not really a very hot piece of code.
> Sure, but I figure if someone is trying to get the performance of the
> incremental parser to match the recursive one, so we can eventually
> replace it, it might get a little warmer. :)

I don't think this is where the performance penalty lies. Rather, I 
suspect it's the stack operations in the non-recursive parser itself. 
The speed test doesn't involve any partial token processing at all, and 
yet the non-recursive parser is significantly slower in that test.

>>> I think it'd be good for a v1.x of this feature to focus on
>>> simplification of the code, and hopefully consolidate and/or eliminate
>>> some of the duplicated parsing work so that the mental model isn't
>>> quite so big.
>> I'm not sure how you think that can be done.
> I think we'd need to teach the lower levels of the lexer about
> incremental parsing too, so that we don't have two separate sources of
> truth about what ends a token. Bonus points if we could keep the parse
> state across chunks to the extent that we didn't need to restart at
> the beginning of the token every time. (Our current tools for this are
> kind of poor, like the restartable state machine in PQconnectPoll.
> While I'm dreaming, I'd like coroutines.) Now, whether the end result
> would be more or less maintainable is left as an exercise...
>

I tried to disturb the main lexer processing as little as possible. We 
could possibly unify the two paths, but I have a strong suspicion that 
would result in a performance hit (the main part of the lexer doesn't 
copy any characters at all, it just keeps track of pointers into the 
input). And while the code might not undergo lots of change, the routine 
itself is quite performance critical.

Anyway, I think that's all something for another day.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Daniel Verite"
Дата:
Сообщение: Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Fix out-of-bounds in the function PQescapeinternal (src/interfaces/libpq/fe-exec.c)