Re: Regression with large XML data input
От | Jim Jones |
---|---|
Тема | Re: Regression with large XML data input |
Дата | |
Msg-id | cc0bd778-9730-4ef9-98b3-a965f8895331@uni-muenster.de обсуждение исходный текст |
Ответ на | Re: Regression with large XML data input (Michael Paquier <michael@paquier.xyz>) |
Ответы |
Re: Regression with large XML data input
|
Список | pgsql-hackers |
On 28.07.25 04:47, Michael Paquier wrote: > I understand that from the point of view of a > maintainer this is rather bad, but from the customer point of view the > current situation is also bad to deal with in the scope of a minor > upgrade, because applications suddenly break. I totally get it --- from the user’s perspective, it’s hard to see this as a bugfix. I was wondering whether using XML_PARSE_HUGE in xml_parse's options could help address this, for example: options = XML_PARSE_NOENT | XML_PARSE_DTDATTR | XML_PARSE_HUGE | (preserve_whitespace ? 0 : XML_PARSE_NOBLANKS); According to libxml2's parserInternals.h: /** * Maximum size allowed for a single text node when building a tree. * This is not a limitation of the parser but a safety boundary feature, * use XML_PARSE_HUGE option to override it. * Introduced in 2.9.0 */ #define XML_MAX_TEXT_LENGTH 10000000 /** * Maximum size allowed when XML_PARSE_HUGE is set. */ #define XML_MAX_HUGE_LENGTH 1000000000 The XML_MAX_TEXT_LENGTH limit is what we're hitting now, but XML_MAX_HUGE_LENGTH is extremely generous. Here's a quick PoC using XML_PARSE_HUGE: psql (19devel) Type "help" for help. postgres=# CREATE TABLE xmldata (message xml); CREATE TABLE postgres=# DO $$ DECLARE huge_size text := repeat('X', 1000000000); BEGIN INSERT INTO xmldata (message) VALUES ((('<foo><bar>' || huge_size ||'</bar></foo>')::xml)); END $$; DO postgres=# SELECT pg_size_pretty(length(message::text)::bigint) FROM xmldata; pg_size_pretty ---------------- 954 MB (1 row) While XML_MAX_HUGE_LENGTH prevents unlimited memory usage, it still opens the door to potential resource exhaustion. I couldn't find a way to dynamically adjust this limit in libxml2. One idea would be to guard XML_PARSE_HUGE behind a GUC --- say, xml_enable_huge_parsing. That would at least allow controlled environments to opt in. But of course, that wouldn't help current releases. Best regards, Jim
В списке pgsql-hackers по дате отправления: