Re: [HACKERS] proposal - Default namespaces for XPath expressions(PostgreSQL 11)

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема Re: [HACKERS] proposal - Default namespaces for XPath expressions(PostgreSQL 11)
Дата
Msg-id CAFj8pRCD8=AzbRJQ2_2rp3+uzade9GHbm0DAF3-t__yVtzo2cA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] proposal - Default namespaces for XPath expressions(PostgreSQL 11)  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Ответы Re: [HACKERS] proposal - Default namespaces for XPath expressions(PostgreSQL 11)  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Список pgsql-hackers


2017-10-02 12:22 GMT+02:00 Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>:
Hi, thanks for the new patch.

# The patch is missing xpath_parser.h. That of the first patch was usable.

At Thu, 28 Sep 2017 07:59:41 +0200, Pavel Stehule <pavel.stehule@gmail.com> wrote in <CAFj8pRBMQa07a=+qQAVMtz5M_hqkJBhiQSOP76+-BrFDj37pvg@mail.gmail.com>
> Hi
>
> now xpath and xpath_exists supports default namespace too

At Wed, 27 Sep 2017 22:41:52 +0200, Pavel Stehule <pavel.stehule@gmail.com> wrote in <CAFj8pRCZ8oneG7g2vxs9ux71n8A9twwUO7zQpJiuz+7RGSpSuw@mail.gmail.com>
> > 1. Uniformity among simliar features
> >
> >   As mentioned in the proposal, but it is lack of uniformity that
> >   the xpath transformer is applied only to xmltable and not for
> >   other xpath related functions.
> >
>
> I have to fix the XPath function. The SQL/XML function Xmlexists doesn't
> support namespaces/

Sorry, I forgot to care about that. (And the definition of
namespace array is of course fabricated by me). I'd like to leave
this to committers.  Anyway it is working but the syntax (or
whether it is acceptable) is still arguable.

SELECT xpath('/a/text()', '<my:a xmlns:my="http://example.com">test</my:a>',
             ARRAY[ARRAY['', 'http://example.com']]);
|  xpath
| --------
|  {test}
| (1 row)


The internal name is properly rejected, but the current internal
name (pgdefnamespace.pgsqlxml.internal) seems a bit too long. We
are preserving some short names and reject them as
user-defined. Doesn't just 'pgsqlxml' work?

LibXML2 does trim to 100 bytes length names. So pgdefnamespace.pgsqlxml.internal is safe from this perspective.

I would to decraese a risk of possible collision, so longer string is better. Maybe "pgsqlxml.internal" is good enoug - I have not a idea. But if somewhere will be this string printed, then
"pgdefnamespace.pgsqlxml.internal" has clean semantic, and it is reason, why I prefer this string. PostgreSQL uses 63 bytes names - and this string is correct too.



Default namespace correctly become to be applied on bare
attribute names.

> updated doc,
> fixed all variants of expected result test file

Sorry for one by one comment but I found another misbehavior.

create table t1 (id int, doc xml);
insert into t1
   values
   (5, '<rows xmlns="http://x.y"><row><a hoge="haha">50</a></row></rows>');
select x.* from t1, xmltable(XMLNAMESPACES('http://x.y' AS x), '/x:rows/x:row' passing t1.doc columns data int PATH 'child::x:a[1][attribute::hoge="haha"]') as x;
|  data
| ------
|    50

but the following fails.

select x.* from t1, xmltable(XMLNAMESPACES(DEFAULT 'http://x.y'), '/rows/row' passing t1.doc columns data int PATH 'child::a[1][attribute::hoge="haha"]') as x;
|  data
| ------
|
| (1 row)

Perhaps child::a is not prefixed by the transformation.

XPath might be complex enough so that it's worth switching to
yacc/lex based transformer that is formally verifiable and won't
need a bunch of cryptic tests that finally cannot prove the
completeness. synchronous_standy_names is far simpler than XPath
but using yacc/lex parser.

I don't think (not yet) - it is simple state machine now, and when the code will be stable, then will not be modified.

Thank you for comments, I'll look on it

Regards

Pavel



Anyway the following is nitpicking of the current xpath_parser.c.

- NODENAME_FIRSTCHAR allows '-' as the first char but it is
  excluded from NameStartChar (https://www.w3.org/TR/REC-xml/#NT-NameStartChar)
  I think characters with high-bit set is okay.
  Also IS_NODENAME_CHAR should be changed.

- NODENAME_FIRSTCHAR and IS_NODENAME_CHAR is in the same category
  but have different naming schemes. Can these are named in the same way?

- The current transoformer seems to using up to one token stack
  depth. Maybe the stack is needless. (pushed token is always
  popped just after)

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Borodin
Дата:
Сообщение: Re: [HACKERS] GSoC 2017 : Patch for predicate locking in Gist index
Следующее
От: Daniel Gustafsson
Дата:
Сообщение: Re: [HACKERS] Support to COMMENT ON DATABASE CURRENT_DATABASE