Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема Re: BUG #15420: Server crash. Segmentation fault when parsing xml file
Дата
Msg-id CAFj8pRBDcq=-3waVae98+KpoxDySbJOvLQ2yhCup4dgdCViJpQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #15420: Server crash. Segmentation fault when parsing xml file  (Sergey Mirvoda <sergey@mirvoda.com>)
Ответы Re: BUG #15420: Server crash. Segmentation fault when parsing xml file  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Список pgsql-bugs


pá 5. 10. 2018 v 14:09 odesílatel Sergey Mirvoda <sergey@mirvoda.com> napsal:

On Fri, Oct 5, 2018 at 10:08 AM Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>>>>> "Andrey" == Andrey Borodin <x4mmm@yandex-team.ru> writes:

 >> You're sure about that libxml2 version? I can reproduce a crash on
 >> 2.9.4, but have as yet failed to do so on 2.9.7 (fails with an error
 >> message instead)

 Andrey> You are right, there was default 2.9.4 from OS, and 2.9.4 from
 Andrey> brew was not used.

 Andrey> x4mmm-osx:pgsql x4mmm$ xmllint --version
 Andrey> xmllint: using libxml version 20904

I have a complete diagnosis of why it crashes on 2.9.4, and I can see
why it does not crash the same way on 2.9.7, but I would not bet
anything on 2.9.7 not having some comparable issue.

What happens on 2.9.4 is this (this is all inside libxml2):

 - at some point when parsing an element tag, the code decides to raise
   a fatal error and call xmlHaltParser

 - xmlHaltParser works by resetting the input buffer's "base" and "cur"
   pointers to point to a literal "" in the code (thus, a null byte)

 - xmlParseStartTag2 detects that input->base has changed, and assumes
   that this is because the buffer got reallocated; in the process of
   dealing with this, it resets input->cur to input->base + cur where
   "cur" is a local variable holding the previous offset in the buffer
   (which is now of course nonsense, so input->cur points into the
   weeds)

 - something later tries to access the byte at *input->cur and likely
   crashes (depending on many random factors, including load addresses
   of shared libraries and where in the buffer the original error was
   detected)

Between 2.9.4 and 2.9.7 xmlParseStartTag2 was changed to handle buffer
reallocations differently so it doesn't fail the same way (it no longer
tries to modify input->cur). But there are so many ways that this error
path can screw itself up that I honestly would not trust it for one
second.

--
Andrew (irc:RhodiumToad)


Sorry for top posting and spelling, T9 and mobile gmail not very usable.

Some notes.

if i set xmloption to document

this code works as expected
postgres=# select d::xml from convert_from(pg_read_binary_file('EGRUL_FULL_2018-01-01_X.XML'),'windows-1251') g(d);
....
postgres=# select xml_is_well_formed(d) from convert_from(pg_read_binary_file('EGRUL_FULL_2018-01-01_X.XML'),'windows-1251') g(d);
 xml_is_well_formed
--------------------
 t
(1 строка)

but all other XML functions still crashing server

for example:
postgres=# select  xpath_exists('//СвЮЛ'::text,d::xml) from convert_from(pg_read_binary_file('egrul/EGRUL_FULL_2018-01-01_X.XML'),'windows-1251') g(d);

There are different parsing methods

 xmlCtxtReadDoc versus xmlParseBalancedChunkMemory

The problem is with xmlParseBalancedChunkMemory

Regards

Pavel


--
--Regards, Sergey Mirvoda

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Sergey Mirvoda
Дата:
Сообщение: Re: BUG #15420: Server crash. Segmentation fault when parsing xml file
Следующее
От: PG Bug reporting form
Дата:
Сообщение: BUG #15421: Error: LIKE pattern must not end with escape character