Re: WIP Incremental JSON Parser

Поиск
Список
Период
Сортировка
От Andrew Dunstan
Тема Re: WIP Incremental JSON Parser
Дата
Msg-id 98787c69-7581-456a-b726-fc93416f0b15@dunslane.net
обсуждение исходный текст
Ответ на Re: WIP Incremental JSON Parser  (Andrew Dunstan <andrew@dunslane.net>)
Ответы Re: WIP Incremental JSON Parser  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
On 2024-04-09 Tu 15:42, Andrew Dunstan wrote:
>
> On 2024-04-09 Tu 07:54, Andrew Dunstan wrote:
>>
>>
>> On 2024-04-09 Tu 01:23, Michael Paquier wrote:
>>> On Tue, Apr 09, 2024 at 09:48:18AM +0900, Michael Paquier wrote:
>>>> There is no direct check on test_json_parser_perf.c, either, only a
>>>> custom rule in the Makefile without specifying something for meson.
>>>> So it looks like you could do short execution check in a TAP test, at
>>>> least.
>>
>>
>> Not adding a test for that was deliberate - any sane test takes a 
>> while, and I didn't want to spend that much time on it every time 
>> someone runs "make check-world" or equivalent. However, adding a test 
>> to run it with a trivial number of iterations seems reasonable, so 
>> I'll add that. I'll also add a meson target for the binary.
>>
>>
>>> While reading the code, I've noticed a few things, as well:
>>>
>>> +   /* max delicious line length is less than this */
>>> +   char        buff[6001];
>>>
>>> Delicious applies to the contents, nothing to do with this code even
>>> if I'd like to think that these lines of code are edible and good.
>>> Using a hardcoded limit for some JSON input sounds like a bad idea to
>>> me even if this is a test module.  Comment that applies to both the
>>> perf and the incremental tools.  You could use a #define'd buffer size
>>> for readability rather than assuming this value in many places.
>>
>>
>> The comment is a remnant of when I hadn't yet added support for 
>> incomplete tokens, and so I had to parse the input line by line. I 
>> agree it can go, and we can use a manifest constant for the buffer size.
>>
>>
>>> +++ b/src/test/modules/test_json_parser/test_json_parser_incremental.c
>>> + * This progam tests incremental parsing of json. The input is fed 
>>> into
>>> + * full range of incement handling, especially in the lexer, is 
>>> exercised.
>>> +++ b/src/test/modules/test_json_parser/test_json_parser_perf.c
>>> + *    Performancet est program for both flavors of the JSON parser
>>> + * This progam tests either the standard (recursive descent) JSON 
>>> parser
>>> +++ b/src/test/modules/test_json_parser/README
>>> +  reads in a file and pases it in very small chunks (60 bytes at a 
>>> time) to
>>>
>>> Collection of typos across various files.
>>
>>
>> Will fix. (The older I get the more typos I seem to make and the 
>> harder it is to notice them. It's very annoying.)
>>
>>
>>> + appendStringInfoString(&json, "1+23 trailing junk");
>>> What's the purpose here?  Perhaps the intention should be documented
>>> in a comment?
>>
>>
>> The purpose is to ensure that if there is not a trailing '\0' on the 
>> json chunk the parser will still do the right thing. I'll add a 
>> comment to that effect.
>>
>>
>>> At the end, having a way to generate JSON blobs randomly to test this
>>> stuff would be more appealing than what you have currently, with
>>> perhaps a few factors like:
>>> - Array and simple object density.
>>> - Max Level of nesting.
>>> - Limitation to ASCII characters that can be escaped.
>>> - Perhaps more things I cannot think about?
>>
>>
>> No, I disagree. Maybe we need those things as well, but we do need a 
>> static test where we can test the output against known results. I 
>> have no objection to changing the input and output files.
>>
>> It's worth noting that t/002_inline.pl does generate some input and 
>> test e.g., the maximum nesting levels among other errors. Perhaps you 
>> missed that. If you think we need more tests there adding them would 
>> be extremely simple.
>>
>>
>>> So the current state of things is kind of disappointing, and the size
>>> of the data set added to the tree is not that portable either if you
>>> want to test various scenarios depending on the data set.  It seems to
>>> me that this has been committed too hastily and that this is not ready
>>> for integration, even if that's just a test module.  Tom also has
>>> shared some concerns upthread, as far as I can see.
>>
>>
>> I have posted a patch already that addresses the issue Tom raised.
>>
>>
>
>
> Here's a consolidated set of cleanup patches, including the memory 
> leak patch and Jacob's shrink-tiny patch.


Here's v2 of the cleanup patch 4, that fixes some more typos kindly 
pointed out to me by Alexander Lakhin.


cheers


andrew




--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Richard Guo
Дата:
Сообщение: [MASSMAIL]Revise some error messages in split partition code
Следующее
От: Etsuro Fujita
Дата:
Сообщение: [MASSMAIL]Comment about handling of asynchronous requests in postgres_fdw.c